Solve Non Linear Equations Numerically - Python - python

I am looking to reproduce results from a research article.
I am at a point, where I have to find the maximum value of the following equation (w), and corresponding independent variable value (k). k is my only variable.
from sympy import *
import numpy as np
import math
rho_l = 1352;
rho_g= 1.225;
sigma = 0.029;
nu = 0.02;
Q = rho_g/ rho_l;
u = 99.67;
h = 1.6e-5; # Half sheet thickness
k = Symbol('k', real=True)
t = tanh(k*h);
w1 = -2 * nu * k**2 * t ;
w2 = 4* nu**2 * k**4 * t**2;
w3 = - Q**2 * u**2 * k**2;
w4 = -(t + Q)
w5 = (- Q* u**2 * k**2 + (sigma*k**3/ rho_l));
w6 = -w4;
w = ( w1 + sqrt(w2 + w3 + w4*w5))/w6;
I was able to solve this using Sympy - diff & solve functions, only when I give t = 1 or a any constant.
Do anyone have suggestions on finding the maximum value of this function? Numerically also works - however, I am not sure about the initial guess value. Good thing is I only have one independent variable.
Edit:
As per the answers given here regarding gradient descent, and also plotting and seeing the maximum value. I literally copied the code lines, that include plotting and I got a different plot.
Any thoughts on why this is happening? I am using Python 3.7

There are a bunch of ways to do this. scipy in particular has a bunch of optimization algorithms. I'm going to use gradient descent (or, perhaps more appropriately, gradient ascent) and autograd because it might be fun.
First, let's import autograd and turn your function into a callable function.
import autograd.numpy as anp
from autograd import grad
import matplotlib.pyplot as plt
def w(k):
rho_l = 1352;
rho_g= 1.225;
sigma = 0.029;
nu = 0.02;
Q = rho_g/ rho_l;
u = 99.67;
h = 1.6e-5; # Half sheet thickness
t = anp.tanh(k*h);
w1 = -2 * nu * k**2 * t ;
w2 = 4* nu**2 * k**4 * t**2;
w3 = - Q**2 * u**2 * k**2;
w4 = -(t + Q)
w5 = (- Q* u**2 * k**2 + (sigma*k**3/ rho_l));
w6 = -w4;
w = ( w1 + anp.sqrt(w2 + w3 + w4*w5))/w6;
return w
Now, we can use autograd to compute the gradient of w with respect to k. You can add some logic to ensure that the procedure will terminate once we meet some tolerance threshold.
dwdk = grad(w)
#Do gradient descent
k = 10.0 #An initial guess
learning_rate = 1
for i in range(1000):
k+=learning_rate*dwdk(k)
And now, let's plot the result to ensure we found the maximum
K = np.arange(0,1000)
plt.plot(K,w(K))
plt.scatter(k, w(k), color = 'red')
plt.show()

My thought is that the function returning values for w (your blue line) might be a truncated estimate (with a polynomial maybe)? Is the formula for w off by a factor of 10 or so?

Here is another method. It is an implementation of the Metropolis algorithm, a so-called Markov Chain Monte Carlo method. Using the definition of w, it is possible to construct a Markov Chain of w(k) called wlist. The tail of this chain should be the maximum of w, and we can recover k that got it by storing the k values in a list called kvalues.
import math
import random
klist = [1.0]
wlist = [w(1.0)] # initialize the chain
# you can tune the value passed to `range`
for _ in range(5000):
k = random.gauss(klist[-1], 0.2*klist[-1]) # q
if k <= 0.0: # assuming max has positive `k` arg
continue
w_hat = w(k)
if w_hat > wlist[-1]:
klist.append(k)
wlist.append(w_hat)
else:
u = random.random()
try:
alpha = math.exp(-w_hat) / math.exp(-wlist[-1])
except ZeroDivisionError:
alpha = 1.0
if u >= alpha:
klist.append(k)
wlist.append(w_hat)
else:
klist.append(klist[-1])
wlist.append(wlist[-1])
wlist[-10:], klist[-10:]
Which should return approximately (my seed is not set) something like this:
([8679.594992731532,
8679.594992731532,
8679.594992731532,
8679.594992731532,
8679.594992731532,
8679.594992731532,
8679.594992731532,
8679.594992731532,
8679.594992731532,
8679.594992731532],
[416.22335719432436,
416.22335719432436,
416.22335719432436,
416.22335719432436,
416.22335719432436,
416.22335719432436,
416.22335719432436,
416.22335719432436,
416.22335719432436,
416.22335719432436])

I do not expect that there is an analytical solution to this problem. There is a theory of pfaffian functions for which it is possible to provide certificate that there are no roots for given range. See https://en.m.wikipedia.org/wiki/Pfaffian_function.
However this heavy artilery.
If you are not sure about initial guess try to compute the function for a large number of random points (e.g.million) and select the best one as starting point. This approach works really well for low dimensional, differentiable problems

Related

I am newbie in python and doing coding for my physics project which requires to generate a matrix with a variable E

I am newbie in python and doing coding for my physics project which requires to generate a matrix with a variable E for which first element of the matrix has to be solved. Please help me. Thanks in advance.
Here is the part of code
import numpy as np
import pylab as pl
import math
import cmath
import sympy as sy
from scipy.optimize import fsolve
#Constants(Values at temp 10K)
hbar = 1.055E-34
m0=9.1095E-31 #free mass of electron
q= 1.602E-19
v = [0.510,0,0.510] # conduction band offset in eV
m1= 0.043 #effective mass in In_0.53Ga_0.47As
m2 = 0.072 #effective mass in Al_0.48In_0.52As
d = [-math.inf,100,math.inf] # dimension of structure in nanometers
'''scaling factor to with units of E in eV, mass in terms of free mass of electron, length in terms
of nanometers '''
s = (2*q*m0*1E-18)/(hbar)**2
#print('scaling factor is ',s)
E = sy.symbols('E') #Suppose energy of incoming particle is 0.3eV
m = [0.043,0.072,0.043] #effective mass of electrons in layers
for i in range(3):
print ('Effective mass of e in layer', i ,'is', m[i])
k=[ ] #Defining an array for wavevectors in different layers
for i in range(3):
k.append(sy.sqrt(s*m[i]*(E-v[i])))
print('Wave vector in layer',i,'is',k[i])
x = []
for i in range(2):
x.append((k[i+1]*m[i])/(k[i]*m[i+1]))
# print(x[i])
#Define Boundary condition matrix for two interfaces.
D0 = (1/2)*sy.Matrix([[1+x[0],1-x[0]], [1-x[0], 1+x[0]]], dtype = complex)
#print(D0)
#A = sy.matrix2numpy(D0,dtype=complex)
D1 = (1/2)*sy.Matrix([[1+x[1],1-x[1]], [1-x[1], 1+x[1]]], dtype = complex)
#print(D1)
#a=eye(3,3)
#print(a)
#Define Propagation matrix for 2nd layer or quantum well
#print(d[1])
#print(k[1])
P1 = 1*sy.Matrix([[sy.exp(-1j*k[1]*d[1]), 0],[0, sy.exp(1j*k[1]*d[1])]], dtype = complex)
#print(P1)
print("abs")
T= D0*P1*D1
#print('Transfer Matrix is given by:',T)
#print('Dimension of tranfer matrix T is' ,T.shape)
#print(T[0,0]
# I want to solve T{0,0} = 0 equation for E
def f(x):
return T[0,0]
x0= 0.5 #intial guess
x = fsolve(f, x0)
print("E is",x)
'''
y=sy.Eq(T[0,0],0)
z=sy.solve(y,E)
print('z',z)
'''
**The main part i guess is the part of the code where i am trying to solve the equation.***Steps I am following:
Defining a symbol E by using sympy
Generating three matrices which involves sum formulae and with variable E
Generating a matrix T my multiplying those 3 matrices,note that elements are complex and involves square roots of negative number.
I need to solve first element of this matrix T[0,0]=0,for variable E and find out value of E. I used fsolve for soving T[0,0]=0.*
Just a note for future questions, please leave out unused imports such as numpy and leave out zombie code like # a = eye(3,3). This helps keep the code as clean and short as possible. Also, the sample code would not run because of indentation problems, so when you copy and paste code, make sure it works before you do so. Always try to make your questions as short and modular as possible.
The expression of T[0,0] is too complex to solve analytically by SymPy so numerical approximation is needed. This leaves 2 options:
using SciPy's solvers which are advanced but require type casting to float values since SciPy does not deal with SymPy objects in any way.
using SymPy's root solvers which are less advanced but are probably simpler to use.
Both of these will only ever produce a single number as output since you can't expect numeric solvers to find every root. If you wanted to find more than one, then I advise that you use a list of points that you want to use as initial values, input each of them into the solvers and keep track of the distinct outputs. This will however never guarantee that you have obtained every root.
Only mix SciPy and SymPy if you are comfortable using both with no problems. SciPy doesn't play at all with SymPy and you should only have list, float, and complex instances when working with SciPy.
import math
import sympy as sy
from scipy.optimize import newton
# Constants(Values at temp 10K)
hbar = 1.055E-34
m0 = 9.1095E-31 # free mass of electron
q = 1.602E-19
v = [0.510, 0, 0.510] # conduction band offset in eV
m1 = 0.043 # effective mass in In_0.53Ga_0.47As
m2 = 0.072 # effective mass in Al_0.48In_0.52As
d = [-math.inf, 100, math.inf] # dimension of structure in nanometers
'''scaling factor to with units of E in eV, mass in terms of free mass of electron, length in terms
of nanometers '''
s = (2 * q * m0 * 1E-18) / hbar ** 2
E = sy.symbols('E') # Suppose energy of incoming particle is 0.3eV
m = [0.043, 0.072, 0.043] # effective mass of electrons in layers
for i in range(3):
print('Effective mass of e in layer', i, 'is', m[i])
k = [] # Defining an array for wavevectors in different layers
for i in range(3):
k.append(sy.sqrt(s * m[i] * (E - v[i])))
print('Wave vector in layer', i, 'is', k[i])
x = []
for i in range(2):
x.append((k[i + 1] * m[i]) / (k[i] * m[i + 1]))
# Define Boundary condition matrix for two interfaces.
D0 = (1 / 2) * sy.Matrix([[1 + x[0], 1 - x[0]], [1 - x[0], 1 + x[0]]], dtype=complex)
D1 = (1 / 2) * sy.Matrix([[1 + x[1], 1 - x[1]], [1 - x[1], 1 + x[1]]], dtype=complex)
# Define Propagation matrix for 2nd layer or quantum well
P1 = 1 * sy.Matrix([[sy.exp(-1j * k[1] * d[1]), 0], [0, sy.exp(1j * k[1] * d[1])]], dtype=complex)
print("abs")
T = D0 * P1 * D1
# did not converge for 0.5
x0 = 0.75
# method 1:
def f(e):
# evaluate T[0,0] at e and remove all sympy related things.
result = complex(T[0, 0].replace(E, e))
return result
solution1 = newton(f, x0)
print(solution1)
# method 2:
solution2 = sy.nsolve(T[0,0], E, x0)
print(solution2)
This prints:
(0.7533104353644469-0.023775286117722193j)
1.00808496181754 - 0.0444042144405285*I
Note that the first line is a native Python complex instance while the second is an instance of SymPy's complex number. One can convert the second simply with print(complex(solution2)).
Now, you'll notice that they produce different numbers but both are correct. This function seems to have a lot of zeros as can be shown from the Geogebra plot:
The red axis is Re(E), green is Im(E) and blue is |T[0,0]|. Each of those "spikes" are probably zeros.

Improve speed of gradient descent

I am trying to maximize a target function f(x) with function scipy.optimize.minimum. But it usually takes 4-5 hrs to run the code because the function f(x) involves a lot of computation of complex matrix. To improve its speed, I want to use gpu. And I've already tried tensorflow package. Since I use numpy to define f(x), I have to convert it into tensorflow's format. However, it doesn't support the computation of complex matrix. What else package or means I can use? Any suggestions?
To specific my problem, I will show calculate scheme below:
Calculate the expectation :
-where H=x*H_0, x is the parameter
Let \phi go through the dynamics of Schrödinger equation
-Different H is correspond to a different \phi_end. Thus, parameter x determines the expectation
Change x, calculate the corresponding expectation
Find a specific x that minimize the expectation
Here is a simple example of part of my code:
import numpy as np
import cmath
from scipy.linalg import expm
import scipy.optimize as opt
# create initial complex matrixes
N = 2 # Dimension of matrix
H = np.array([[1.0 + 1.0j] * N] * N) # a complex matrix with shape(N, N)
A = np.array([[0.0j] * N] * N)
A[0][0] = 1.0 + 1j
# calculate the expectation
def value(phi):
exp_H = expm(H) # put the matrix in the exp function
new_phi = np.linalg.linalg.matmul(exp_H, phi)
# calculate the expectation of the matrix
x = np.linalg.linalg.matmul(H, new_phi)
expectation = np.inner(np.conj(phi), x)
return expectation
# Contants
tmax = 1
dt = 0.1
nstep = int(tmax/dt)
phi_init = [1.0 + 1.0j] * N
# 1st derivative of Schrödinger equation
def dXdt(t, phi, H): # 1st derivative of the function
return -1j * np.linalg.linalg.matmul(H, phi)
def f(X):
phi = [[0j] * N] * nstep # store every time's phi
phi[0] = phi_init
# phi go through the dynamics of Schrödinger equation
for i in range(nstep - 1):
phi[i + 1] = phi[i] - dXdt(i * dt, X[i] * H, phi[i]) * dt
# calculate the corresponding value
f_result = value(phi[-1])
return f_result
# Initialize the parameter
X0 = np.array(np.ones(nstep))
results = opt.minimize(f, X0) # minimize the target function
opt_x = results.x
PS:
Python Version: 3.7
Operation System: Win 10

Is there a way to easily integrate a set of differential equations over a full grid of points?

The problem is that I would like to be able to integrate the differential equations starting for each point of the grid at once instead of having to loop over the scipy integrator for each coordinate. (I'm sure there's an easy way)
As background for the code I'm trying to solve the trajectories of a Couette flux alternating the direction of the velocity each certain period, that is a well known dynamical system that produces chaos. I don't think the rest of the code really matters as the part of the integration with scipy and my usage of the meshgrid function of numpy.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation, writers
from scipy.integrate import solve_ivp
start_T = 100
L = 1
V = 1
total_run_time = 10*3
grid_points = 10
T_list = np.arange(start_T, 1, -1)
x = np.linspace(0, L, grid_points)
y = np.linspace(0, L, grid_points)
X, Y = np.meshgrid(x, y)
condition = True
totals = np.zeros((start_T, total_run_time, 2))
alphas = np.zeros(start_T)
i = 0
for T in T_list:
alphas[i] = L / (V * T)
solution = np.array([X, Y])
for steps in range(int(total_run_time/T)):
t = steps*T
if condition:
def eq(t, x):
return V * np.sin(2 * np.pi * x[1] / L), 0.0
condition = False
else:
def eq(t, x):
return 0.0, V * np.sin(2 * np.pi * x[1] / L)
condition = True
time_steps = np.arange(t, t + T)
xt = solve_ivp(eq, time_steps, solution)
solution = np.array([xt.y[0], xt.y[1]])
totals[i][t: t + T][0] = solution[0]
totals[i][t: t + T][1] = solution[1]
i += 1
np.save('alphas.npy', alphas)
np.save('totals.npy', totals)
The error given is :
ValueError: y0 must be 1-dimensional.
And it comes from the 'solve_ivp' function of scipy because it doesn't accept the format of the numpy function meshgrid. I know I could run some loops and get over it but I'm assuming there must be a 'good' way to do it using numpy and scipy. I accept advice for the rest of the code too.
Yes, you can do that, in several variants. The question remains if it is advisable.
To implement a generally usable ODE integrator, it needs to be abstracted from the models. Most implementations do that by having the state space a flat-array vector space, some allow a vector space engine to be passed as parameter, so that structured vector spaces can be used. The scipy integrators are not of this type.
So you need to translate the states to flat vectors for the integrator, and back to the structured state for the model.
def encode(X,Y): return np.concatenate([X.flatten(),Y.flatten()])
def decode(U): return U.reshape([2,grid_points,grid_points])
Then you can implement the ODE function as
def eq(t,U):
X,Y = decode(U)
Vec = V * np.sin(2 * np.pi * x[1] / L)
if int(t/T)%2==0:
return encode(Vec, np.zeros(Vec.shape))
else:
return encode(np.zeros(Vec.shape), Vec)
with initial value
U0 = encode(X,Y)
Then this can be directly integrated over the whole time span.
Why this might be not such a good idea: Thinking of each grid point and its trajectory separately, each trajectory has its own sequence of adapted time steps for the given error level. In integrating all simultaneously, the adapted step size is the minimum over all trajectories at the given time. Thus while the individual trajectories might have only short intervals with very small step sizes amid long intervals with sparse time steps, these can overlap in the ensemble to result in very small step sizes everywhere.
If you go beyond the testing stage, switch to a more compiled solver implementation, odeint is a Fortran code with wrappers, so half a solution. JITcode translates to C code and links with the compiled solver behind odeint. Leaving python you get sundials, the diffeq module of julia-lang, or boost::odeint.
TL;DR
I don't think you can "integrate the differential equations starting for each point of the grid at once".
MWE
Please try to provide a MWE to reproduce your problem, like you said : "I don't think the rest of the code really matters", and it makes it harder for people to understand your problem.
Understanding how to talk to the solver
Before answering your question, there are several things that seem to be misunderstood :
by defining time_steps = np.arange(t, t + T) and then calling solve_ivp(eq, time_steps, solution) : the second argument of solve_ivp is the time span you want the solution for, ie, the "start" and "stop" time as a 2-uple. Here your time_steps is 30-long (for the first loop), so I would probably replace it by (t, t+T). Look for t_span in the doc.
from what I understand, it seems like you want to control each iteration of the numerical resolution : that's not how solve_ivp works. More over, I think you want to switch the function "eq" at each iteration. Since you have to pass the "the right hand side" of the equation, you need to wrap this behavior inside a function. It would not work (see right after) but in terms of concept something like this:
def RHS(t, x):
# unwrap your variables, condition is like an additional variable of your problem,
# with a very simple differential equation
x0, x1, condition = x
# compute new results for x0 and x1
if condition:
x0_out, x1_out = V * np.sin(2 * np.pi * x[1] / L), 0.0
else:
x0_out, x1_out = 0.0, V * np.sin(2 * np.pi * x[1] / L)
# compute new result for condition
condition_out = not(condition)
return [x0_out, x1_out, condition_out]
This would not work because the evolution of condition doesn't satisfy some mathematical properties of derivation/continuity. So condition is like a boolean switch that parametrizes the model, we can use global to control the state of this boolean :
condition = True
def RHS_eq(t, y):
global condition
x0, x1 = y
# compute new results for x0 and x1
if condition:
x0_out, x1_out = V * np.sin(2 * np.pi * x1 / L), 0.0
else:
x0_out, x1_out = 0.0, V * np.sin(2 * np.pi * x1 / L)
# update condition
condition = 0 if condition==1 else 1
return [x0_out, x1_out]
finaly, and this is the ValueError you mentionned in your post : you define solution = np.array([X, Y]) which actually is initial condition and supposed to be "y0: array_like, shape (n,)" where n is the number of variable of the problem (in the case of [x0_out, x1_out] that would be 2)
A MWE for a single initial condition
All that being said, lets start with a simple MWE for a single starting point (0.5,0.5), so we have a clear view of how to use the solver :
import numpy as np
from scipy.integrate import solve_ivp
import matplotlib.pyplot as plt
# initial conditions for x0, x1, and condition
initial = [0.5, 0.5]
condition = True
# time span
t_span = (0, 100)
# constants
V = 1
L = 1
# define the "model", ie the set of equations of t
def RHS_eq(t, y):
global condition
x0, x1 = y
# compute new results for x0 and x1
if condition:
x0_out, x1_out = V * np.sin(2 * np.pi * x1 / L), 0.0
else:
x0_out, x1_out = 0.0, V * np.sin(2 * np.pi * x1 / L)
# update condition
condition = 0 if condition==1 else 1
return [x0_out, x1_out]
solution = solve_ivp(RHS_eq, # Right Hand Side of the equation(s)
t_span, # time span, a 2-uple
initial, # initial conditions
)
fig, ax = plt.subplots()
ax.plot(solution.t,
solution.y[0],
label="x0")
ax.plot(solution.t,
solution.y[1],
label="x1")
ax.legend()
Final answer
Now, what we want is to do the exact same thing but for various initial conditions, and from what I understand, we can't : again, quoting the doc
y0 : array_like, shape (n,) : Initial state. . The solver's initial condition only allows one starting point vector.
So to answer the initial question : I don't think you can "integrate the differential equations starting for each point of the grid at once".

Have I implemented Milstein's method/Euler-Maruyama correctly?

I have an stochastic differential equation (SDE) that I am trying to solve using Milsteins method but am getting results that disagree with experiment.
The SDE is
which I have broken up into 2 first order equations:
eq1:
eq2:
Then I have used the Ito form:
So that for eq1:
and for eq2:
My python code used to attempt to solve this is like so:
# set constants from real data
Gamma0 = 4000 # defines enviromental damping
Omega0 = 75e3*2*np.pi # defines the angular frequency of the motion
eta = 0 # set eta 0 => no effect from non-linear p*q**2 term
T_0 = 300 # temperature of enviroment
k_b = scipy.constants.Boltzmann
m = 3.1e-19 # mass of oscillator
# set a and b functions for these 2 equations
def a_p(t, p, q):
return -(Gamma0 - Omega0*eta*q**2)*p
def b_p(t, p, q):
return np.sqrt(2*Gamma0*k_b*T_0/m)
def a_q(t, p, q):
return p
# generate time data
dt = 10e-11
tArray = np.arange(0, 200e-6, dt)
# initialise q and p arrays and set initial conditions to 0, 0
q0 = 0
p0 = 0
q = np.zeros_like(tArray)
p = np.zeros_like(tArray)
q[0] = q0
p[0] = p0
# generate normally distributed random numbers
dwArray = np.random.normal(0, np.sqrt(dt), len(tArray)) # independent and identically distributed normal random variables with expected value 0 and variance dt
# iterate through implementing Milstein's method (technically Euler-Maruyama since b' = 0
for n, t in enumerate(tArray[:-1]):
dw = dwArray[n]
p[n+1] = p[n] + a_p(t, p[n], q[n])*dt + b_p(t, p[n], q[n])*dw + 0
q[n+1] = q[n] + a_q(t, p[n], q[n])*dt + 0
Where in this case p is velocity and q is position.
I then get the following plots of q and p:
I expected the resulting plot of position to look something like the following, which I get from experimental data (from which the constants used in the model are determined):
Have I implemented Milstein's method correctly?
If I have, what else might be wrong my process of solving the SDE that'd causing this disagreement with the experiment?
You missed a term in the drift coefficient, note that to the right of dp there are two dt terms. Thus
def a_p(t, p, q):
return -(Gamma0 - Omega0*eta*q**2)*p - Omega0**2*q
which is actually the part that makes the oscillator into an oscillator. With that corrected the solution looks like
And no, you did not implement the Milstein method as there are no derivatives of b_p which are what distinguishes Milstein from Euler-Maruyama, the missing term is +0.5*b'(X)*b(X)*(dW**2-dt).
There is also a derivative-free version of Milsteins method as a two-stage kind-of Runge-Kutta method, documented in wikipedia or the original in arxiv.org (PDF).
The step there is (vector based, duplicate into X=[p,q], K1=[k1_p,k1_q] etc. to be close to your conventions)
S = random_choice_of ([-1,1])
K1 = a(X )*dt + b(X )*(dW - S*sqrt(dt))
Xh = X + K1
K2 = a(Xh)*dt + b(Xh)*(dW + S*sqrt(dt))
X = X + 0.5 * (K1+K2)

scipy.optimize solution using python for the following equation

I am very new to scipy and doing data analysis in python. I am trying to solve the following regularized optimization problem and unfortunately I haven't been able to make too much sense from the scipy documentation. I am looking to solve the following constrained optimization problem using scipy.optimize
Here is the function I am looking to minimize:
here A is an m X n matrix , the first term in the minimization is the residual sum of squares, the second is the matrix frobenius (L2 norm) of a sparse n X n matrix W, and the third one is an L1 norm of the same matrix W.
In the function A is an m X n matrix , the first term in the minimization is the residual sum of squares, the second term is the matrix frobenius (L2 norm) of a sparse n X n matrix W, and the third one is an L1 norm of the same matrix W.
I would like to know how to minimize this function subject to the constraints that:
wj >= 0
wj,j = 0
I would like to use coordinate descent (or any other method that scipy.optimize provides) to solve the above problem. I would like so direction on how to achieve this as I have no idea how to take the frobenius norm or how to tune the parameters beta and lambda or whether the scipy.optimize will tune and return the parameters for me. Any help regarding these questions would be much appreciated.
Thanks in advance!
How large is m and n?
Here is a basic example for how to use fmin:
from scipy import optimize
import numpy as np
m = 5
n = 3
a = np.random.rand(m, n)
idx = np.arange(n)
def func(w, beta, lam):
w = w.reshape(n, n)
w2 = np.abs(w)
w2[idx, idx] = 0
return 0.5*((a - np.dot(a, w2))**2).sum() + lam*w2.sum() + 0.5*beta*(w2**2).sum()
w = optimize.fmin(func, np.random.rand(n*n), args=(0.1, 0.2))
w = w.reshape(n, n)
w[idx, idx] = 0
w = np.abs(w)
print w
If you want to use coordinate descent, you can implement it by theano.
http://deeplearning.net/software/theano/
Your problem seems tailor-made for cvxopt - http://cvxopt.org/
and in particular
http://cvxopt.org/userguide/solvers.html#problems-with-nonlinear-objectives
using fmin would likely be slower, since it does not take advantage of gradient / Hessian information.
The code in HYRY's answer also has the drawback that as far as fmin is concerned the diagonal W is a variable and fmin would try to move the W-diagonal values around until it realizes that they don't do anything (since the objective function resets them to zero). Here is the implementation in cvxopt of HYRY's code that explicitly enforces the zero-constraints and uses gradient info, WARNING: I couldn't derive the Hessian for your objective... and you might double-check the gradient as well:
'''CVXOPT version:'''
from numpy import *
from cvxopt import matrix, mul
''' warning: CVXOPT uses column-major order (Fortran) '''
m = 5
n = 3
n_active = (n)*(n-1)
A = matrix(random.rand(m*n),(m,n))
ids = arange(n)
beta = 0.1;
lam = 0.2;
W = matrix(zeros(n*n), (n,n));
def cvx_objective_func(w=None, z=None):
if w is None:
num_nonlinear_constraints = 0;
w_0 = matrix(1, (n_active,1), 'd');
return num_nonlinear_constraints, w_0
#main call:
'calculate objective:'
'form W matrix, warning _w is column-major order (Fortran)'
'''column-major order!'''
_w = matrix(w, (n, n-1))
for k in xrange(n):
W[k, 0:k] = _w[k, 0:k]
W[k, k+1:n] = _w[k, k:n-1]
squared_error = A - A*W
objective_value = .5 * sum( mul(squared_error,squared_error)) +\
.5* beta*sum(mul(W,W)) +\
lam * sum(abs(W));
'not sure if i calculated this right...'
_Df = -A.T*(squared_error) + beta*W + lam;
'''column-major order!'''
Df = matrix(0., (1, n*(n-1)))
for jdx in arange(n):
for idx in list(arange(0,jdx)) + list(arange(jdx+1,n)):
idx = int(idx);
jdx = int(jdx)
Df[0, jdx*(n-1) + idx] = _Df[idx, jdx]
if z is None:
return objective_value, Df
'''Also form hessian of objective+non-linear constraints
(but there are no nonlinear constraints) :
This is the trickiest part...
WARNING: H is for sure coded wrong'''
H = matrix(1., (n_active, n_active))
return objective_value, Df, H
m, w_0 = cvx_objective_func()
print cvx_objective_func(w_0)
G = -matrix(diag(ones(n_active),), (n_active,n_active))
h = matrix(0., (n_active,1), 'd')
from cvxopt import solvers
print solvers.cp(cvx_objective_func, G=G, h=h)
having said that, the tricks to eliminate the equality/inequality constraints in HYRY's code are quite cute

Categories

Resources