Improve speed of gradient descent

Improve speed of gradient descent - python

I am trying to maximize a target function f(x) with function scipy.optimize.minimum. But it usually takes 4-5 hrs to run the code because the function f(x) involves a lot of computation of complex matrix. To improve its speed, I want to use gpu. And I've already tried tensorflow package. Since I use numpy to define f(x), I have to convert it into tensorflow's format. However, it doesn't support the computation of complex matrix. What else package or means I can use? Any suggestions?
To specific my problem, I will show calculate scheme below:
Calculate the expectation :
-where H=x*H_0, x is the parameter
Let \phi go through the dynamics of Schrödinger equation
-Different H is correspond to a different \phi_end. Thus, parameter x determines the expectation
Change x, calculate the corresponding expectation
Find a specific x that minimize the expectation
Here is a simple example of part of my code:
import numpy as np
import cmath
from scipy.linalg import expm
import scipy.optimize as opt
# create initial complex matrixes
N = 2 # Dimension of matrix
H = np.array([[1.0 + 1.0j] * N] * N) # a complex matrix with shape(N, N)
A = np.array([[0.0j] * N] * N)
A[0][0] = 1.0 + 1j
# calculate the expectation
def value(phi):
exp_H = expm(H) # put the matrix in the exp function
new_phi = np.linalg.linalg.matmul(exp_H, phi)
# calculate the expectation of the matrix
x = np.linalg.linalg.matmul(H, new_phi)
expectation = np.inner(np.conj(phi), x)
return expectation
# Contants
tmax = 1
dt = 0.1
nstep = int(tmax/dt)
phi_init = [1.0 + 1.0j] * N
# 1st derivative of Schrödinger equation
def dXdt(t, phi, H): # 1st derivative of the function
return -1j * np.linalg.linalg.matmul(H, phi)
def f(X):
phi = [[0j] * N] * nstep # store every time's phi
phi[0] = phi_init
# phi go through the dynamics of Schrödinger equation
for i in range(nstep - 1):
phi[i + 1] = phi[i] - dXdt(i * dt, X[i] * H, phi[i]) * dt
# calculate the corresponding value
f_result = value(phi[-1])
return f_result
# Initialize the parameter
X0 = np.array(np.ones(nstep))
results = opt.minimize(f, X0) # minimize the target function
opt_x = results.x
PS:
Python Version: 3.7
Operation System: Win 10

Related

SKlearn Gaussian Process with constant, manually set correlation

I want to use the Gaussian Process approximation for a simple 1D test function to illustrate a few things. I want to iterate over a few different values for the correlation matrix (since this is 1D it is just a single value) and show what effect different values have on the approximation. My understanding is, that "theta" is the parameter for this. Therefore I want to set the theta value manually and don't want any optimization/changes to it. I thought the constant kernel and the clone_with_theta function might get me what I want but I didn't get it to work. Here is what I have so far:
import numpy as np
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as ConstantKernel
def f(x):
"""The function to predict."""
return x/2 + ((1/10 + x) * np.sin(5*x - 1))/(1 + x**2 * (np.sin(x - (1/2))**2))
# ----------------------------------------------------------------------
# Data Points
X = np.atleast_2d(np.delete(np.linspace(-1,1, 7),4)).T
y = f(X).ravel()
# Instantiate a Gaussian Process model
kernel = ConstantKernel(constant_value=1, constant_value_bounds='fixed')
theta = np.array([0.5,0.5])
kernel = kernel.clone_with_theta(theta)
gp = GaussianProcessRegressor(kernel=kernel, optimizer=None)
# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, y)
# Make the prediction on the meshed x-axis (ask for MSE as well)
y_pred, sigma = gp.predict(x, return_std=True)
# Plot
# ...

I programmed a simple implementation myself now, which allows to set correlation (here 'b') manually:
import numpy as np
from numpy.linalg import inv
def f(x):
"""The function to predict."""
return x/2 + ((1/10 + x) * np.sin(5*x - 1))/(1 + x**2 * (np.sin(x - (1/2))**2))
def kriging_approx(x,xt,yt,b,mu,R_inv):
N = yt.size
one = np.matrix(np.ones((yt.size))).T
r = np.zeros((N))
for i in range(0,N):
r[i]= np.exp(-b * (xt[i]-x)**2)
y = mu + np.matmul(np.matmul(r.T,R_inv),yt - mu*one)
y = y[0,0]
return y
def calc_R (x,b):
N = x.size
# setup R
R = np.zeros((N,N))
for i in range(0,N):
for j in range(0,N):
R[i][j] = np.exp(-b * (x[i]-x[j])**2)
R_inv = inv(R)
return R, R_inv
def calc_mu_sig (yt, R_inv):
N = yt.size
one = np.matrix(np.ones((N))).T
mu = np.matmul(np.matmul(one.T,R_inv),yt) / np.matmul(np.matmul(one.T,R_inv),one)
mu = mu[0,0]
sig2 = (np.matmul(np.matmul((yt - mu*one).T,R_inv),yt - mu*one))/(N)
sig2 = sig2[0,0]
return mu, sig2
# ----------------------------------------------------------------------
# Data Points
xt = np.linspace(-1,1, 7)
yt = np.matrix((f(xt))).T
# Calc R
R, R_inv = calc_R(xt, b)
# Calc mu and sigma
mu_dach, sig_dach2 = calc_mu_sig(yt, R_inv)
# Point to get approximation for
x = 1
y_approx = kriging_approx(x, xt, yt, b, mu_dach, R_inv)

Is there a way to easily integrate a set of differential equations over a full grid of points?

The problem is that I would like to be able to integrate the differential equations starting for each point of the grid at once instead of having to loop over the scipy integrator for each coordinate. (I'm sure there's an easy way)
As background for the code I'm trying to solve the trajectories of a Couette flux alternating the direction of the velocity each certain period, that is a well known dynamical system that produces chaos. I don't think the rest of the code really matters as the part of the integration with scipy and my usage of the meshgrid function of numpy.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation, writers
from scipy.integrate import solve_ivp
start_T = 100
L = 1
V = 1
total_run_time = 10*3
grid_points = 10
T_list = np.arange(start_T, 1, -1)
x = np.linspace(0, L, grid_points)
y = np.linspace(0, L, grid_points)
X, Y = np.meshgrid(x, y)
condition = True
totals = np.zeros((start_T, total_run_time, 2))
alphas = np.zeros(start_T)
i = 0
for T in T_list:
alphas[i] = L / (V * T)
solution = np.array([X, Y])
for steps in range(int(total_run_time/T)):
t = steps*T
if condition:
def eq(t, x):
return V * np.sin(2 * np.pi * x[1] / L), 0.0
condition = False
else:
def eq(t, x):
return 0.0, V * np.sin(2 * np.pi * x[1] / L)
condition = True
time_steps = np.arange(t, t + T)
xt = solve_ivp(eq, time_steps, solution)
solution = np.array([xt.y[0], xt.y[1]])
totals[i][t: t + T][0] = solution[0]
totals[i][t: t + T][1] = solution[1]
i += 1
np.save('alphas.npy', alphas)
np.save('totals.npy', totals)
The error given is :
ValueError: y0 must be 1-dimensional.
And it comes from the 'solve_ivp' function of scipy because it doesn't accept the format of the numpy function meshgrid. I know I could run some loops and get over it but I'm assuming there must be a 'good' way to do it using numpy and scipy. I accept advice for the rest of the code too.

Yes, you can do that, in several variants. The question remains if it is advisable.
To implement a generally usable ODE integrator, it needs to be abstracted from the models. Most implementations do that by having the state space a flat-array vector space, some allow a vector space engine to be passed as parameter, so that structured vector spaces can be used. The scipy integrators are not of this type.
So you need to translate the states to flat vectors for the integrator, and back to the structured state for the model.
def encode(X,Y): return np.concatenate([X.flatten(),Y.flatten()])
def decode(U): return U.reshape([2,grid_points,grid_points])
Then you can implement the ODE function as
def eq(t,U):
X,Y = decode(U)
Vec = V * np.sin(2 * np.pi * x[1] / L)
if int(t/T)%2==0:
return encode(Vec, np.zeros(Vec.shape))
else:
return encode(np.zeros(Vec.shape), Vec)
with initial value
U0 = encode(X,Y)
Then this can be directly integrated over the whole time span.
Why this might be not such a good idea: Thinking of each grid point and its trajectory separately, each trajectory has its own sequence of adapted time steps for the given error level. In integrating all simultaneously, the adapted step size is the minimum over all trajectories at the given time. Thus while the individual trajectories might have only short intervals with very small step sizes amid long intervals with sparse time steps, these can overlap in the ensemble to result in very small step sizes everywhere.
If you go beyond the testing stage, switch to a more compiled solver implementation, odeint is a Fortran code with wrappers, so half a solution. JITcode translates to C code and links with the compiled solver behind odeint. Leaving python you get sundials, the diffeq module of julia-lang, or boost::odeint.

TL;DR
I don't think you can "integrate the differential equations starting for each point of the grid at once".
MWE
Please try to provide a MWE to reproduce your problem, like you said : "I don't think the rest of the code really matters", and it makes it harder for people to understand your problem.
Understanding how to talk to the solver
Before answering your question, there are several things that seem to be misunderstood :
by defining time_steps = np.arange(t, t + T) and then calling solve_ivp(eq, time_steps, solution) : the second argument of solve_ivp is the time span you want the solution for, ie, the "start" and "stop" time as a 2-uple. Here your time_steps is 30-long (for the first loop), so I would probably replace it by (t, t+T). Look for t_span in the doc.
from what I understand, it seems like you want to control each iteration of the numerical resolution : that's not how solve_ivp works. More over, I think you want to switch the function "eq" at each iteration. Since you have to pass the "the right hand side" of the equation, you need to wrap this behavior inside a function. It would not work (see right after) but in terms of concept something like this:
def RHS(t, x):
# unwrap your variables, condition is like an additional variable of your problem,
# with a very simple differential equation
x0, x1, condition = x
# compute new results for x0 and x1
if condition:
x0_out, x1_out = V * np.sin(2 * np.pi * x[1] / L), 0.0
else:
x0_out, x1_out = 0.0, V * np.sin(2 * np.pi * x[1] / L)
# compute new result for condition
condition_out = not(condition)
return [x0_out, x1_out, condition_out]
This would not work because the evolution of condition doesn't satisfy some mathematical properties of derivation/continuity. So condition is like a boolean switch that parametrizes the model, we can use global to control the state of this boolean :
condition = True
def RHS_eq(t, y):
global condition
x0, x1 = y
# compute new results for x0 and x1
if condition:
x0_out, x1_out = V * np.sin(2 * np.pi * x1 / L), 0.0
else:
x0_out, x1_out = 0.0, V * np.sin(2 * np.pi * x1 / L)
# update condition
condition = 0 if condition==1 else 1
return [x0_out, x1_out]
finaly, and this is the ValueError you mentionned in your post : you define solution = np.array([X, Y]) which actually is initial condition and supposed to be "y0: array_like, shape (n,)" where n is the number of variable of the problem (in the case of [x0_out, x1_out] that would be 2)
A MWE for a single initial condition
All that being said, lets start with a simple MWE for a single starting point (0.5,0.5), so we have a clear view of how to use the solver :
import numpy as np
from scipy.integrate import solve_ivp
import matplotlib.pyplot as plt
# initial conditions for x0, x1, and condition
initial = [0.5, 0.5]
condition = True
# time span
t_span = (0, 100)
# constants
V = 1
L = 1
# define the "model", ie the set of equations of t
def RHS_eq(t, y):
global condition
x0, x1 = y
# compute new results for x0 and x1
if condition:
x0_out, x1_out = V * np.sin(2 * np.pi * x1 / L), 0.0
else:
x0_out, x1_out = 0.0, V * np.sin(2 * np.pi * x1 / L)
# update condition
condition = 0 if condition==1 else 1
return [x0_out, x1_out]
solution = solve_ivp(RHS_eq, # Right Hand Side of the equation(s)
t_span, # time span, a 2-uple
initial, # initial conditions
)
fig, ax = plt.subplots()
ax.plot(solution.t,
solution.y[0],
label="x0")
ax.plot(solution.t,
solution.y[1],
label="x1")
ax.legend()
Final answer
Now, what we want is to do the exact same thing but for various initial conditions, and from what I understand, we can't : again, quoting the doc
y0 : array_like, shape (n,) : Initial state. . The solver's initial condition only allows one starting point vector.
So to answer the initial question : I don't think you can "integrate the differential equations starting for each point of the grid at once".

Plotting Fourier Series coefficients in Python using Simpson's Rule

I want to 1. express Simpson's Rule as a general function for integration in python and 2. use it to compute and plot the Fourier Series coefficients of the function .
I've stolen and adapted this code for Simpson's Rule, which seems to work fine for integrating simple functions such as ,
or
Given period , the Fourier Series coefficients are computed as:
where k = 1,2,3,...
I am having difficulty figuring out how to express . I'm aware that since this function is odd, but I would like to be able to compute it in general for other functions.
Here's my attempt so far:
import matplotlib.pyplot as plt
from numpy import *
def f(t):
k = 1
for k in range(1,10000): #to give some representation of k's span
k += 1
return sin(t)*sin(k*t)
def trapezoid(f, a, b, n):
h = float(b - a) / n
s = 0.0
s += f(a)/2.0
for j in range(1, n):
s += f(a + j*h)
s += f(b)/2.0
return s * h
print trapezoid(f, 0, 2*pi, 100)
This doesn't give the correct answer of 0 at all since it increases as k increases and I'm sure I'm approaching it with tunnel vision in terms of the for loop. My difficulty in particular is with stating the function so that k is read as k = 1,2,3,...
The problem I've been given unfortunately doesn't specify what the coefficients are to be plotted against, but I am assuming it's meant to be against k.

Here's one way to do it, if you want to run your own integration or fourier coefficient determination instead of using numpy or scipy's built in methods:
import numpy as np
def integrate(f, a, b, n):
t = np.linspace(a, b, n)
return (b - a) * np.sum(f(t)) / n
def a_k(f, k):
def ker(t): return f(t) * np.cos(k * t)
return integrate(ker, 0, 2*np.pi, 2**10+1) / np.pi
def b_k(f, k):
def ker(t): return f(t) * np.sin(k * t)
return integrate(ker, 0, 2*np.pi, 2**10+1) / np.pi
print(b_k(np.sin, 0))
This gives the result
0.0
On a side note, trapezoid integration is not very useful for uniform time intervals. But if you desire:
def trap_integrate(f, a, b, n):
t = np.linspace(a, b, n)
f_t = f(t)
dt = t[1:] - t[:-1]
f_ab = f_t[:-1] + f_t[1:]
return 0.5 * np.sum(dt * f_ab)
There's also np.trapz if you want to use pre-builtin functionality. Similarly, there's also scipy.integrate.trapz

Density of multivariate t distribution in Python for large number of observations

I am trying to evaluate the density of multivariate t distribution of a 13-d vector. Using the dmvt function from the mvtnorm package in R, the result I get is
[1] 1.009831e-13
When i tried to write the function by myself in Python (thanks to the suggestions in this post:
multivariate student t-distribution with python), I realized that the gamma function was taking very high values (given the fact that I have n=7512 observations), making my function going out of range.
I tried to modify the algorithm, using the math.lgamma() and np.linalg.slogdet() functions to transform it to the log scale, but the result I got was
8.97669876e-15
This is the function that I used in python is the following:
def dmvt(x,mu,Sigma,df,d):
'''
Multivariate t-student density:
output:
the density of the given element
input:
x = parameter (d dimensional numpy array or scalar)
mu = mean (d dimensional numpy array or scalar)
Sigma = scale matrix (dxd numpy array)
df = degrees of freedom
d: dimension
'''
Num = math.lgamma( 1. *(d+df)/2 ) - math.lgamma( 1.*df/2 )
(sign, logdet) = np.linalg.slogdet(Sigma)
Denom =1/2*logdet + d/2*( np.log(pi)+np.log(df) ) + 1.*( (d+df)/2 )*np.log(1 + (1./df)*np.dot(np.dot((x - mu),np.linalg.inv(Sigma)), (x - mu)))
d = 1. * (Num - Denom)
return np.exp(d)
Any ideas why this functions does not produce the same results as the R equivalent?
Using as x = (0,0) produces similar results (up to a point, die to rounding) but with x = (1,1)1 I get a significant difference!

I finally managed to 'translate' the code from the mvtnorm package in R and the following script works without numerical underflows.
import numpy as np
import scipy.stats
import math
from math import lgamma
from numpy import matrix
from numpy import linalg
from numpy.linalg import slogdet
import scipy.special
from scipy.special import gammaln
mu = np.array([3,3])
x = np.array([1, 1])
Sigma = np.array([[1, 0], [0, 1]])
p=2
df=1
def dmvt(x, mu, Sigma, df, log):
'''
Multivariate t-student density. Returns the density
of the function at points specified by x.
input:
x = parameter (n x d numpy array)
mu = mean (d dimensional numpy array)
Sigma = scale matrix (d x d numpy array)
df = degrees of freedom
log = log scale or not
'''
p = Sigma.shape[0] # Dimensionality
dec = np.linalg.cholesky(Sigma)
R_x_m = np.linalg.solve(dec,np.matrix.transpose(x)-mu)
rss = np.power(R_x_m,2).sum(axis=0)
logretval = lgamma(1.0*(p + df)/2) - (lgamma(1.0*df/2) + np.sum(np.log(dec.diagonal())) \
+ p/2 * np.log(math.pi * df)) - 0.5 * (df + p) * math.log1p((rss/df) )
if log == False:
return(np.exp(logretval))
else:
return(logretval)
print(dmvt(x,mu,Sigma,df,True))
print(dmvt(x,mu,Sigma,df,False))

python divide by zero encountered in log - logistic regression

I'm trying to implement a multiclass logistic regression classifier that distinguishes between k different classes.
This is my code.
import numpy as np
from scipy.special import expit
def cost(X,y,theta,regTerm):
(m,n) = X.shape
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
return J
def gradient(X,y,theta,regTerm):
(m,n) = X.shape
grad = np.dot(((expit(np.dot(X,theta))).reshape(m,1) - y).T,X)/m + (np.concatenate(([0],theta[1:].T),axis=0)).reshape(1,n)
return np.asarray(grad)
def train(X,y,regTerm,learnRate,epsilon,k):
(m,n) = X.shape
theta = np.zeros((k,n))
for i in range(0,k):
previousCost = 0;
currentCost = cost(X,y,theta[i,:],regTerm)
while(np.abs(currentCost-previousCost) > epsilon):
print(theta[i,:])
theta[i,:] = theta[i,:] - learnRate*gradient(X,y,theta[i,:],regTerm)
print(theta[i,:])
previousCost = currentCost
currentCost = cost(X,y,theta[i,:],regTerm)
return theta
trX = np.load('trX.npy')
trY = np.load('trY.npy')
theta = train(trX,trY,2,0.1,0.1,4)
I can verify that cost and gradient are returning values that are in the right dimension (cost returns a scalar, and gradient returns a 1 by n row vector), but i get the error
RuntimeWarning: divide by zero encountered in log
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
why is this happening and how can i avoid this?

The proper solution here is to add some small epsilon to the argument of log function. What worked for me was
epsilon = 1e-5
def cost(X, y, theta):
m = X.shape[0]
yp = expit(X # theta)
cost = - np.average(y * np.log(yp + epsilon) + (1 - y) * np.log(1 - yp + epsilon))
return cost

You can clean up the formula by appropriately using broadcasting, the operator * for dot products of vectors, and the operator # for matrix multiplication — and breaking it up as suggested in the comments.
Here is your cost function:
def cost(X, y, theta, regTerm):
m = X.shape[0] # or y.shape, or even p.shape after the next line, number of training set
p = expit(X # theta)
log_loss = -np.average(y*np.log(p) + (1-y)*np.log(1-p))
J = log_loss + regTerm * np.linalg.norm(theta[1:]) / (2*m)
return J
You can clean up your gradient function along the same lines.
By the way, are you sure you want np.linalg.norm(theta[1:]). If you're trying to do L2-regularization, the term should be np.linalg.norm(theta[1:]) ** 2.

Cause:
This is happening because in some cases, whenever y[i] is equal to 1, the value of the Sigmoid function (theta) also becomes equal to 1.
Cost function:
J = (np.dot(-(y.T),np.log(expit(np.dot(X,theta))))-np.dot((np.ones((m,1))-y).T,np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1))))) / m + (regTerm / (2 * m)) * np.linalg.norm(theta[1:])
Now, consider the following part in the above code snippet:
np.log(np.ones((m,1)) - (expit(np.dot(X,theta))).reshape((m,1)))
Here, you are performing (1 - theta) when the value of theta is 1. So, that will effectively become log (1 - 1) = log (0) which is undefined.

I'm guessing your data has negative values in it. You can't log a negative.
import numpy as np
np.log(2)
> 0.69314718055994529
np.log(-2)
> nan
There are a lot of different ways to transform your data that should help, if this is the case.

def cost(X, y, theta):
yp = expit(X # theta)
cost = - np.average(y * np.log(yp) + (1 - y) * np.log(1 - yp))
return cost
The warning originates from np.log(yp) when yp==0 and in np.log(1 - yp) when yp==1. One option is to filter out these values, and not to pass them into np.log. The other option is to add a small constant to prevent the value from being exactly 0 (as suggested in one of the comments above)

Add epsilon value[which is a miniature value] to the log value so that it won't be a problem at all.
But i am not sure if it will give accurate results or not .

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Improve speed of gradient descent - python

Related

SKlearn Gaussian Process with constant, manually set correlation

Is there a way to easily integrate a set of differential equations over a full grid of points?

Plotting Fourier Series coefficients in Python using Simpson's Rule

Density of multivariate t distribution in Python for large number of observations

python divide by zero encountered in log - logistic regression

Categories

Resources