How to calculate gradient from function in python - tuple IndexError? - python

I'm looking to find the gradient, at point x, for the following function:
f(x) = w1 * x1^2 + w2 * x2
My code so far:
def gradient(w1, w2, x):
gradient = w1 * (x[0]**2) + w2 * (x[1]**2)
return gradient
However, this doesn't work for the following e.g
w1 = 5; w2 = 3; x = (1,)
I'm receiving this error:
IndexError: tuple index out of range
Does this mean one of my indices is wrong? I thought a tuple only has two index 0 & 1. Apologies - appreciate this may be a v basic question.

It works for you, you just passed only one value to the tuple, and you need two. If you wanted to pass zero in this way, then this is incorrect. (1,) - > (1, 0)
def gradient(w1, w2, x):
gradient = w1 * (x[0]**2) + w2 * (x[1]**2)
return gradient
w1 = 5
w2 = 3
x = (1,0)
print(gradient(w1,w2,x))
Output:
5

Related

How the gradient is calculated in pytorch

I have an example code. When I calculate dloss/dw manually I get the result 8, but the following code gives me a 16. Please tell me how the gradient is 16.
import torch
x = torch.tensor(2.0)
y = torch.tensor(2.0)
w = torch.tensor(3.0, requires_grad=True)
# forward
y_hat = w * x
s = y_hat - y
loss = s**2
#backward
loss.backward()
print(w.grad)
I think you simply miscalculated.
The derivation of loss = (w * x - y) ^ 2 is:
dloss/dw = 2 * (w * x - y) * x = 2 * (3 * 2 - 2) * 2 = 16
Keep in mind that back-propagation in neural networks is done by applying the chain rule: I think you forgot the *x at the end of the derivation
To be specific:
chain rule for derivation says that df(g(x))/dx = f'(g(x)) * g'(x) (derivated with respect to x)
the whole loss function in your case is built like this:
loss(y_hat) = (y_hat - y)^2
y_hat(x) = w * x
thus: loss(y_hat(x)) = (y_hat(x) - y)^2
the derivation of this is according to chain rule:
dloss(y_hat(x))/dw = loss'(y_hat(x)) * dy_hat(x)/dw
for any z:
loss'(z) = 2 * (z - y) * 1 and dy_hat(z)/dw = z
thus: dloss((y_hat(x))/dw = dloss(y_hat(x))/dw = loss'(y_hat(x)) * y_hat'(x) = 2 * (y_hat(x) - z) * dy_hat(x)/dw = 2 * (y_hat(x) - z) * x = 2 * (w * x - z) * x = 16
pytorch knows that in your forward pass each layer applies some kind of function to its input and that your forward pass is 1 * loss(y_hat(x)) and than keeps applying the chain rule for the backward pass (each layer requires one application of the chain rule).

Using a list of floats for a loop

I'm trying to run a Runge-Kutta algorithm to approximate a differential equation. I want to go through a list of values for a constant variable, A, in the function and have the algorithm loop go through for each item in the list and produce a graph. I keep getting an error saying "list indices must be integers or slices but not a float". I tried to convert the numbers in the list to being integer fractions of each other but that didn't work either. I'm mostly unsure on how to circumvent this error as some fixes I found didn't work, here is my code:
import numpy as np
import matplotlib.pyplot as plt
from math import pi
from numpy import arange
from matplotlib.pyplot import plot,show
wo = 1
w = 2 #defining wo, w, g1, Amplitude and steps
h = 0.001
g1 = 0.2
A = [0.1,0.25,0.5,0.7,0.75,0.85,0.95,1.00,1.02,1.031,1.033,1.035,1.05]
for item in list(A): #Converting list items into Float values
[float(i) for i in A]
xpoints = arange(0,100,h)
tpoints = []
zpoints = []
t=0
x = 0
z = pi/2
for i in A: #Calls for items in Amplitude list to run algorighm
def F(t, z, x): #Defining the differential equation
return -g1 * z - (wo ** 2 + 2 * A[i] * np.cos(w * t)) * np.sin(x)
for x in xpoints:
tpoints.append(t)
zpoints.append(z)
m1 = z*h
k1 = h*F(t,z,x) #setting up the runge-kutta algorithm
m2 = h*(z+(k1/2))
k2 = h*F(t+0.5*m1,z+0.5*m1,x+0.5*h)
m3 = h*(z+0.5*k2)
k3 = h*F(t+0.5*m2,z+0.5*m2,x+0.5*h)
m4 = h*(z+0.5*k3)
k4 = h*F(t+0.5*m3,z+0.5*m3,x+0.5*h)
t += (m1+2*m2+2*m3+m4)/6
z += (k1+2*k2+2*k3+k4)/6
A += 1
plot(xpoints,zpoints)
The problem isn't that the numbers themselves need to be converted. Note how you iterate with for i in A:. This means that i is the actual value and not the index. So where you use A[i], you're trying to go to the 0.1 index of A. Instead, just replace A[i] with i in the line at the bottom of this snippet.
A = [0.1,0.25,0.5,0.7,0.75,0.85,0.95,1.00,1.02,1.031,1.033,1.035,1.05]
...
for i in A:
def F(t, z, x):
return -g1 * z - (wo ** 2 + 2 * A[i] * np.cos(w * t)) * np.sin(x)
Because the value of i is an element of A. If you want to loop index by index in A list:
for i in range(len(A))
this works.
This time, you get an error in A + = 1. I think this place will be i + = 1.

Calculate vector gradient without using a Python library

I am trying to find the gradient of the function
f(x) = w1 * x1^2 + w2 * x2
where x is a vector coordinate (x1,x2).
def gradient(w1, w2, x):
x= (x1,x2)
gradx1=2*w1*x1 + w2 * x2
gradx2= w2 + w1 * x1^2
return (gradx1, gradx2)
My code is coming up with a nameError, saying x1 is not defined when calling the function:
gradient(5, 6, (10,10))
First things first:
x1, x2 = x # unpack your coord tuple
And secondly:
gradx2= w2 + w1 * x1 ** 2 # or gradx2= w2 + w1 * x1 * x1
in python ^ is bitwise XOR. Exponentiation is **.
x is a tuple which you need to unpack like so:
x1, x2 = x
Rather than:
x = (x1, x2)

Is there a faster way of repeating a chunk of code x times and taking an average?

Starting with:
a,b=np.ogrid[0:n+1:1,0:n+1:1]
B=np.exp(1j*(np.pi/3)*np.abs(a-b))
B[z,b] = np.exp(1j * (np.pi/3) * np.abs(z - b +x))
B[a,z] = np.exp(1j * (np.pi/3) * np.abs(a - z +x))
B[diag,diag]=1-1j/np.sqrt(3)
this produces an n*n grid that acts as a matrix.
n is just a number chosen to represent the indices, i.e. an a*b matrix where a and b both go up to n.
Where z is a constant I choose to replace a row and column with the B[z,b] and B[a,z] formulas. (Essentially the same formula but with a small number added to the np.abs(a-b))
The diagonal of the matrix is given by the bottom line:
B[diag,diag]=1-1j/np.sqrt(3)
where,
diag=np.arange(n+1)
I would like to repeat this code 50 times where the only thing that changes is x so I will end up with 50 versions of the B np.ogrid. x is a randomly generated number between -0.8 and 0.8 each time.
x=np.random.uniform(-0.8,0.8)
I want to generate 50 versions of B with random values of x each time and take a geometric average of the 50 versions of B using the definition:
def geo_mean(y):
y = np.asarray(y)
return np.prod(y ** (1.0 / y.shape[0]), axis=-1)
I have tried to set B as a function of some index and then use a for _ in range(): loop, this doesn't work. Aside from copy and pasting the block 50 times and denoting each one as B1, B2, B3 etc; I can't think of another way of working this out.
EDIT:
I'm now using part of a given solution in order to show clearly what I am looking for:
#A matrix with 50 random values between -0.8 and 0.8 to be used in the loop
X=np.random.uniform(-0.8,0.8, (50,1))
#constructing the base array before modification by random x values in position z
a,b = np.ogrid[0:n+1:1,0:n+1:1]
B = np.exp(1j * ( np.pi / 3) * np.abs( a - b ))
B[diag,diag] = 1 - 1j / np.sqrt(3)
#list to store all modified arrays
randomarrays = []
for i in range( 0,50 ):
#copy array and modify it
Bnew = np.copy( B )
Bnew[z, b] = np.exp( 1j * ( np.pi / 3 ) * np.abs(z - b + X[i]))
Bnew[a, z] = np.exp( 1j * ( np.pi / 3 ) * np.abs(a - z + X[i]))
randomarrays.append(Bnew)
Bstack = np.dstack(randomarrays)
#calculate the geometric mean value along the axis that was the row in 2D arrays
B0 = geo_mean(Bstack)
From this example, every iteration of i uses the same value of X, I can't seem to get a way to get each new loop of i to use the next value in the matrix X. I am unsure of the ++ action in python, I know it does not work in python, I just don't know how to use the python equivalent. I want a loop to use a value of X, then the next loop to use the next value and so on and so forth so I can dstack all the matrices at the end and find a geo_mean for each element in the stacked matrices.
One pedestrian way would be to use a list comprehension or generator expression:
>>> def f(n, z, x):
... diag = np.arange(n+1)
... a,b=np.ogrid[0:n+1:1,0:n+1:1]
... B=np.exp(1j*(np.pi/3)*np.abs(a-b))
... B[z,b] = np.exp(1j * (np.pi/3) * np.abs(z - b +x))
... B[a,z] = np.exp(1j * (np.pi/3) * np.abs(a - z +x))
... B[diag,diag]=1-1j/np.sqrt(3)
... return B
...
>>> X = np.random.uniform(-0.8, 0.8, (10,))
>>> np.prod((*map(np.power, map(f, 10*(4,), 10*(2,), X), 10 * (1/10,)),), axis=0)
But in your concrete example we can do much better than that;
using the identity exp(a) x exp(b) = exp(a + b) we can convert the geometric mean after exponentiation to an arithmetic mean before exponentition. A bit of care is required because of the multivaluedness of the complex n-th root which occurs in the geometric mean. In the code below we normalize the angles occurring to range -pi, pi so as to always hit the same branch as the n-th root.
Please also note that the geo_mean function you provide is definitely wrong. It fails the basic sanity check that taking the average of copies of the same thing should return the same thing. I've provided a better version. It is still not perfect, but I think there actually is no perfect solution, because of the nonuniqueness of the complex root.
Because of this I recommend taking the average before exponentiating. As long as your random spread is less than pi this allows a well-defined averaging procedure with an average that is actually close to the samples
import numpy as np
def f(n, z, X, do_it_pps_way=True):
X = np.asanyarray(X)
diag = np.arange(n+1)
a,b=np.ogrid[0:n+1:1,0:n+1:1]
B=np.exp(1j*(np.pi/3)*np.abs(a-b))
X = X.reshape(-1,1,1)
if do_it_pps_way:
zbx = np.mean(np.abs(z-b+X), axis=0)
azx = np.mean(np.abs(a-z+X), axis=0)
else:
zbx = np.mean((np.abs(z-b+X)+3) % 6 - 3, axis=0)
azx = np.mean((np.abs(a-z+X)+3) % 6 - 3, axis=0)
B[z,b] = np.exp(1j * (np.pi/3) * zbx)
B[a,z] = np.exp(1j * (np.pi/3) * azx)
B[diag,diag]=1-1j/np.sqrt(3)
return B
def geo_mean(y):
y = np.asarray(y)
dim = len(y.shape)
y = np.atleast_2d(y)
v = np.prod(y, axis=0) ** (1.0 / y.shape[0])
return v[0] if dim == 1 else v
def geo_mean_correct(y):
y = np.asarray(y)
return np.prod(y ** (1.0 / y.shape[0]), axis=0)
# demo that orig geo_mean is wrong
B = np.exp(1j * np.random.random((5, 5)))
# the mean of four times the same thing should be the same thing:
if not np.allclose(B, geo_mean([B, B, B, B])):
print('geo_mean failed')
if np.allclose(B, geo_mean_correct([B, B, B, B])):
print('but geo_mean_correct works')
n, z, m = 10, 3, 50
X = np.random.uniform(-0.8, 0.8, (m,))
B0 = f(n, z, X, do_it_pps_way=False)
B1 = np.prod((*map(np.power, map(f, m*(n,), m*(z,), X), m * (1/m,)),), axis=0)
B2 = geo_mean_correct([f(n, z, x) for x in X])
# This is the recommended way:
B_recommended = f(n, z, X, do_it_pps_way=True)
print()
print(np.allclose(B1, B0))
print(np.allclose(B2, B1))
I think you should rely more on numpy functionality, when approaching your problem. Not a numpy expert myself, so there is surely room for improvement:
from scipy.stats import gmean
n = 2
z = 1
a = np.arange(n + 1).reshape(1, n + 1)
#constructing the base array before modification by random x values in position z
B = np.exp(1j * (np.pi / 3) * np.abs(a - a.T))
B[a, a] = 1 - 1j / np.sqrt(3)
#list to store all modified arrays
random_arrays = []
for _ in range(50):
#generate random x value
x=np.random.uniform(-0.8, 0.8)
#copy array and modify it
B_new = np.copy(B)
B_new[z, a] = np.exp(1j * (np.pi / 3) * np.abs(z - a + x))
B_new[a, z] = np.exp(1j * (np.pi / 3) * np.abs(a - z + x))
random_arrays.append(B_new)
#store all B arrays as a 3D array
B_stack = np.stack(random_arrays)
#calculate the geometric mean value along the axis that was the row in 2D arrays
geom_mean_for_rows = gmean(B_stack, axis = 2)
It uses the geometric mean function from scipy.stats module to have a vectorised approach for this calculation.

Non-linear least square minimization of 2 variables (different dimension) in python

I have a function of two variables k and T.
If have the value of the function for a number of (k,T) couple. However I do not have the same amount for each. For example I know the values f of the function at 2 T and 3 k:
F(k1,T1) = f1
F(k1,T2) = f2
F(k2,T1) = f3
F(k2,T2) = f4
F(k3,T1) = f5
F(k3,T2) = f6
I also know the form of the function F:
def func(X, a, b, c, omega):
T,k = X # The two variables
n = 1.0 / ( np.exp(omega / T ) - 1.0 )
return a * k * n + b * k**2 * (n + 1.0)
I would like to find the value of a,b,c and omega that minimize the error.
I tried with curve_fit:
k = [k1,k2,k3]
T = [T1,T2]
F[k1,T1] = f1
F[k1,T2] = f2
F[k2,T1] = f3
F[k2,T2] = f4
F[k3,T1] = f5
F[k3,T2] = f6
popt, pcov = curve_fit(func, (T,k), F )
However I get the following error (in my practical case I have 19 k values and 4 T values):
return a * k * n + b * k**2 * (n + 1.0)
ValueError: operands could not be broadcast together with shapes (19,) (4,)
Now if I create an array of higher dimension:
X = np.zeros((4,19,2))
for ii in np.arange(19):
X[0,ii,:] = np.array([T[0],k[ii]])
X[1,ii,:] = np.array([T[1],k[ii]])
X[2,ii,:] = np.array([T[2],k[ii]])
X[3,ii,:] = np.array([T[3],k[ii]])
and pass that:
def func(X, a, b, c, omega):
T = X[:,:,0]
k = X[:,:,1]
n = 1.0 / ( np.exp(omega / T ) - 1.0 )
return a * k * n + b * k**2 * (n + 1.0)
popt, pcov = curve_fit(func, X, F )
then I get the following issue:
minpack.error: Result from function call is not a proper array of floats.
Thank you in advance.
You need an array of pairs of data with the input X (probably your original dataset already looks like that) and the corresponding output array F:
X = np.array([k1,T1],[k1,T2],[k2,T1],[k2,T2],[k3,T1],[k3,T2])
F = [f1,f2,f3,f4,f5,f6]
Then calling the curve_fit function is directly:
popt, pcov = curve_fit(func, (X[:,0],X[:,1]),F)
Alternatively you can use single arrays for the k and T and use them in place of X[:,0] and X[:,1], but note that they should have the same dimensions since each element corresponds with the individual value of k and T of each observation/experiment. In other words, the index in the k or T array tells you the label of the corresponding observation.

Categories

Resources