Python: least square fit with side conditions on fit-parameters

Python: least square fit with side conditions on fit-parameters - python

I have a timeseries that I want to fit to function using Scipy.optimize.leastsq.
fitfunc= lambda a, x: a[0]+a[1]*exp(-x/a[4])+a[2]*exp(-x/a[5])+a[3]*exp(-x /a[6])
errfunc lambda a,x,y: fitfunc(a,x) - y
Next I would pass errfunc to leastsq to minimze it. The fit-function I use is a sum of exponentials decaying with different timescales a(4:6) and different weights (a(0:4)). (as a sideuqestion: can I use leastsq with more than 1 parameter arrays? I didn't suceed to do so....)
The question: How can I put additional side conditions on the parameters entering the fit-function. I want for example that sum(a(0:4))=1.0

Just use
import numpy as np
def fitfunc(p, x):
a = np.zeros(7)
a[1:7] = p[:6]
a[0] = 1 - a[1:4].sum()
return a[0] + a[1]*exp(-x/a[4]) + a[2]*exp(-x/a[5]) + a[3]*exp(-x/a[6])
def errfunc(p, x, y1, y2):
return np.concatenate((
fitfunc(p[:6], x) - y1,
fitfunc(p[6:12], x) - y2
))
Generally, lambda-functions are considered bad style (and they don't add anything in your code). To have several functions in the least square fit you may just append the functions as I indicated using np.concatenate. It doesn't make much sense, if none of the parameters are correlated, though. It will only slow down convergence of the algorithm. The side condition you asked for, is implemented by just calculating one weight based on the constraint you gave (see 1 - a[1:4].sum()).

If you can't solve the equations for you constraints, and you can live with the constraint being satisfied with some tolerance, another possibility is to add a term to the chi-square with the large weight which guarantees that the constraint is almost satisfied.
E.g if you need that \sum(sin(p[i])==1 ,you can do the following:
constraint_func = lambda a: sin(a).sum()-1
def fitfunc (a,x):
np.concatenate((a[0]+a[1]*exp(-x/a[4])+a[2]*exp(-x/a[5])+a[3]*exp(-x /a[6]),
[constraint_func(a)]))
def errfunc(a,x,y):
tolerance = 1e-10
return np.concatenate((fitfunc(a,x) - y, [tolerance]))
Obviously the convergence will be slower, but will be still guaranteed.

Related

How do I understand the gradient for complex functions that do not satisfy the Cauchy Reimann Equation

Let us suppose that my function is
(z : C -> C)
z = x - i*y
now here the real part is,
u(x, y) = x
the imaginary part is,
v(x, y) = -y
so, when we get the derivatives, we find
d_u_x(x,y) = 1 # derivative of u wrt x
d_u_y(x,y) = 0
d_v_x(x, y) = 0
d_v_y(x, y) = -1
so, here,
d_u_x != d_v_y
thus, it does not follow Cauchy Reimann equation.
but, then comes the Wirtinger calculus, that says, I could write my function as,
u(x, y) = ((x + iy) + (x - iy))/2
= (z + z.conj())/2
v(x, y) = (((x + iy) - (x - iy))/2i
= (z - z.conj())/2i
but what after this, how do I find the gradient.
plus, in PyTorch, what is the correct way to specify such a function,
if I do,
import torch
a = torch.randn(1, dtype=torch.cfloat, requires_grad=True)
f = a.conj()
f.backward()
print(a.grad)
is this a correct way?

You may find the following page of interest:
When you use PyTorch to differentiate any function f(z) with complex domain and/or codomain, the gradients are computed under the assumption that the function is a part of a larger real-valued loss function g(input)=L. The gradient computed is ∂L/∂z* (note the conjugation of z), the negative of which is precisely the direction of steepest descent used in Gradient Descent algorithm. Thus, all the existing optimizers work out of the box with complex parameters.
This convention matches TensorFlow’s convention for complex differentiation, but is different from JAX (which computes ∂L/∂z).
If you have a real-to-real function which internally uses complex operations, the convention here doesn’t matter: you will always get the same result that you would have gotten if it had been implemented with only real operations.
...
For optimization problems, only real valued objective functions are used in the research community since complex numbers are not part of any ordered field and so having complex valued loss does not make much sense.
It also turns out that no interesting real-valued objective fulfill the Cauchy-Riemann equations. So the theory with homomorphic function cannot be used for optimization and most people therefore use the Wirtinger calculus.
https://pytorch.org/docs/stable/notes/autograd.html

Manual implementation of a Support Vector Machine

I am wondering is there any article where SVM (Support Vector Machine) is implemented manually in R or Python.
I do not want to use a built-in function or package. ?
The example could be very simple in terms of feature space and linear separable.
I just want to go through the whole process to enhance my understanding.

The answer to this question is rather broad since there are several possible algorithms in order to train SVMs. Also packages like LibSVM (available both for Python and R) are open-source so you are free to check the code inside.
Hereinafter I will consider the Sequential Minimal Optimization (SMO) algorithm by J. Pratt which is implemented in LibSVM. Implementing manually an algorithm which solves the SVM optimization problem is rather tedious but, if that's your first approach with SVMs I'd suggest the following (albeit simplified) version of the SMO algorithm
http://cs229.stanford.edu/materials/smo.pdf
This lecture is from Prof. Andrew Ng (Stanford) and he shows a simplified version of the SMO algorithm. I don't know what your theoretical background in SVMs is, but let's just say that the major difference is that the Lagrange multipliers pair (alpha_i and alpha_j) is randomly selected whereas in the original SMO algorithm there's a much harder heuristic involved.
In other terms, thus, this algorithm does not guarantee to converge to a global optimum (which is always true in SVMs for the dataset at hand if trained with a proper algorithm), but this can give you a nice introduction to the optimization problem behind SVMs.
This paper, however, does not show any codes in R or Python but the pseudo-code is rather straightforward and easy to implement (~100 lines of code in either Matlab or Python). Also keep in mind that this training algorithm returns alpha (Lagrange multipliers vector) and b (intercept). You might want to have some additional parameters such as the proper Support Vectors but starting from vector alpha such quantities are rather easy to calculate.
At first, let's suppose you have a LearningSet in which there are as many rows as there are patterns and as many columns as there are features and let also suppose that you have LearningLabels which is a vector with as many elements as there are patterns and this vector (as its name suggests) contains the proper labels for the patterns in LearningSet.
Also recall that alpha has as many elements as there are patterns.
In order to evaluate the Support Vector indices you can check whether element i in alpha is greater than or equal to 0: if alpha[i]>0 then the i-th pattern from LearningSet is a Support Vector. Similarly, the i-th element from LearningLabels is the related label.
Finally, you might want to evaluate vector w, the free parameters vector. Given that alpha is known, you can apply the following formula
w = ((alpha.*LearningLabels)'*LearningSet)'
where alpha and LearningLabels are column vectors and LearningSet is the matrix as described above. In the above formula .* is the element-wise product whereas ' is the transpose operator.

I would like to add a small supplement to the previous answer.
If you feel confident with linear algebra, there is an alternative derivation of the SMO algorithm and a very simple implementation based on that formulas.
Basically, it contains about 30 lines of code in Python
class SVM:
def __init__(self, kernel='linear', C=10000.0, max_iter=100000, degree=3, gamma=1):
self.kernel = {'poly' : lambda x,y: np.dot(x, y.T)**degree,
'rbf' : lambda x,y: np.exp(-gamma*np.sum((y - x[:,np.newaxis])**2, axis=-1)),
'linear': lambda x,y: np.dot(x, y.T)}[kernel]
self.C = C
self.max_iter = max_iter
def restrict_to_square(self, t, v0, u):
t = (np.clip(v0 + t*u, 0, self.C) - v0)[1]/u[1]
return (np.clip(v0 + t*u, 0, self.C) - v0)[0]/u[0]
def fit(self, X, y):
self.X = X.copy()
self.y = y * 2 - 1
self.lambdas = np.zeros_like(self.y, dtype=float)
self.K = self.kernel(self.X, self.X) * self.y[:,np.newaxis] * self.y
for _ in range(self.max_iter):
for idxM in range(len(self.lambdas)):
idxL = np.random.randint(0, len(self.lambdas))
Q = self.K[[[idxM, idxM], [idxL, idxL]], [[idxM, idxL], [idxM, idxL]]]
v0 = self.lambdas[[idxM, idxL]]
k0 = 1 - np.sum(self.lambdas * self.K[[idxM, idxL]], axis=1)
u = np.array([-self.y[idxL], self.y[idxM]])
t_max = np.dot(k0, u) / (np.dot(np.dot(Q, u), u) + 1E-15)
self.lambdas[[idxM, idxL]] = v0 + u * self.restrict_to_square(t_max, v0, u)
idx, = np.nonzero(self.lambdas > 1E-15)
self.b = np.sum((1.0 - np.sum(self.K[idx] * self.lambdas, axis=1)) * self.y[idx]) / len(idx)
def decision_function(self, X):
return np.sum(self.kernel(X, self.X) * self.y * self.lambdas, axis=1) + self.b
In simple cases it works not much worth than sklearn.svm.SVC, comparison shown below
For more elaborate explanation with formulas you may want to refer to this ResearchGate preprint.
Code for image generation can be found on GitHub.

Bound solutions in scipy ode solver

I want to solve a high amount of bilinear ODE systems in python. The derivative is this:
def x_(x, t, growth, connections):
return x * growth + np.dot(connections, x) * x
I am not interested in very accurate results but in the qualitative behavior, i.e. whether a component goes to zero or not.
Because I have to solve such a big quantity of high-deminsional systems, I want to use a step size as big as possible.
Due to big step sizes it can happen that the ODE goes in one component below zero. This should not be possible since (because of the structure of the particular ODE) each component is bounded by zero. Hence - to prevent wrong results - I would like to set each component manually to zero once it is below.
Furtherly, in the systems that I want to solve it can happen that solutions blow up. I want to prevent this by setting an upper bound as well, i.e. if a value exceeds the bound it is set back to the value of the bound.
I hope I can make my goal understandable giving you the following pseudo-code of what I want to do:
for t in range(0, tEnd, dt):
$ compute x(t) using x(t-dt) $
x(t) = np.minimum(np.maximum(x(t), 0), upperBound)
I implemented this using a Runge-Kutta algorithm. Everything works fine. Just the performance is bad. Therefore, I would prefer using a pre-implemented method like scipy.integrate.odeint.
Thereby, I have no idea on how to set such bounds. An option that I tried was to manipulate the ODE that way, that the derivative becomes 0 once x is above the bound, and (positive) one once x is below 0. In addition, to prevent too high jumps within one timestep, I also bounded the derivative:
def x_(x, t, growth, connections, bound):
return (x > 0) * np.minimum((x < bound) * \
( x * growth + np.dot(connections, x) * x ), bound) + (x < 0)
Though this solution (especially for the zero-bound) is very ugly it would be sufficient if it worked. Unfortunately, it does not work. Using odeint
x = scipy.integrate.odeint(x_, x0, timesteps, param)
I get very often one of these two errors:
Repeated convergence failures (perhaps bad Jacobian or tolerances).
Excess work done on this call (perhaps wrong Dfun type).
They may be due to the discontinuities of my manipulated ODE. There are plenty of threads about these error messages on the internet but they did not help me. E.g. increasing the amount of allowed steps did neither prevent this issue nor is it a good solution for me since I need to use big step sizes. Furtherly, passing the Jacobian did not help either.
Having a look onto the solutions one can see that two types of strange behavior happen when the errors occure:
The solution blows in one single time-step up to +-1e250 (that should be impossible since dx/dt is bounded).
It first reaches the bound but goes down again (that should be impossible because x is at the bound and therefore x_ is 0).
I would appreciate all hints on how to solve the issue - no matter whether it is help on
how to prevent the errors in odeint
how to manipulate the ODE properly or on
how to write a very fast ODE solver where I can directly implement my needs.
I thank you in advance!
Edit
I was asked for a minimal example:
import numpy as np
import random as rd
rd.seed()
import scipy.integrate
def simulate(simParam, dim = 20, connectivity = .8, conRange = 1, threshold = 1E-3,
constGrowing=None):
"""
Creates the random system matrix and starts a simulation
"""
x0 = np.zeros(dim, dtype='float') + 1
connections = np.zeros(shape=(dim, dim), dtype='float')
growth = np.zeros(dim, dtype='float') +
(constGrowing if not constGrowing == None else 0)
for i in range(dim):
for j in range(dim):
if i != j:
connections[i][j] = rd.uniform(-conRange, conRange)
tend, step = simParam
return RK4NumPy(x_, (growth, connections), x0, 0, tend, step)
def x_(x, t, growth, connections, bound):
"""
Derivative of the ODE
"""
return (x > 0) * np.minimum((x < bound) *
(x * growth + np.dot(connections, x) * x), bound) + (x < 0)
def RK4NumPy(x_, param, x0, t0, tend, step, maxV = 1.0E2, silent=True):
"""
solving method
"""
param = param + (maxV,)
timesteps = np.arange(t0 + step, tend, step)
return scipy.integrate.odeint(x_, x0, timesteps, param)
simulate((300, 0.5))
To see the solution one would have to plot x. With the given parameters I get very often the above mentioned error
Excess work done on this call (perhaps wrong Dfun type).
Run with full_output = 1 to get quantitative information.

How to find all zeros of a function using numpy (and scipy)?

Suppose I have a function f(x) defined between a and b. This function can have many zeros, but also many asymptotes. I need to retrieve all the zeros of this function. What is the best way to do it?
Actually, my strategy is the following:
I evaluate my function on a given number of points
I detect whether there is a change of sign
I find the zero between the points that are changing sign
I verify if the zero found is really a zero, or if this is an asymptote
U = numpy.linspace(a, b, 100) # evaluate function at 100 different points
c = f(U)
s = numpy.sign(c)
for i in range(100-1):
if s[i] + s[i+1] == 0: # oposite signs
u = scipy.optimize.brentq(f, U[i], U[i+1])
z = f(u)
if numpy.isnan(z) or abs(z) > 1e-3:
continue
print('found zero at {}'.format(u))
This algorithm seems to work, except I see two potential problems:
It will not detect a zero that doesn't cross the x axis (for example, in a function like f(x) = x**2) However, I don't think it can occur with the function I'm evaluating.
If the discretization points are too far, there could be more that one zero between them, and the algorithm could fail finding them.
Do you have a better strategy (still efficient) to find all the zeros of a function?
I don't think it's important for the question, but for those who are curious, I'm dealing with characteristic equations of wave propagation in optical fiber. The function looks like (where V and ell are previously defined, and ell is an positive integer):
def f(u):
w = numpy.sqrt(V**2 - u**2)
jl = scipy.special.jn(ell, u)
jl1 = scipy.special.jnjn(ell-1, u)
kl = scipy.special.jnkn(ell, w)
kl1 = scipy.special.jnkn(ell-1, w)
return jl / (u*jl1) + kl / (w*kl1)

Why are you limited to numpy? Scipy has a package that does exactly what you want:
http://docs.scipy.org/doc/scipy/reference/optimize.nonlin.html
One lesson I've learned: numerical programming is hard, so don't do it :)
Anyway, if you're dead set on building the algorithm yourself, the doc page on scipy I linked (takes forever to load, btw) gives you a list of algorithms to start with. One method that I've used before is to discretize the function to the degree that is necessary for your problem. (That is, tune \delta x so that it is much smaller than the characteristic size in your problem.) This lets you look for features of the function (like changes in sign). AND, you can compute the derivative of a line segment (probably since kindergarten) pretty easily, so your discretized function has a well-defined first derivative. Because you've tuned the dx to be smaller than the characteristic size, you're guaranteed not to miss any features of the function that are important for your problem.
If you want to know what "characteristic size" means, look for some parameter of your function with units of length or 1/length. That is, for some function f(x), assume x has units of length and f has no units. Then look for the things that multiply x. For example, if you want to discretize cos(\pi x), the parameter that multiplies x (if x has units of length) must have units of 1/length. So the characteristic size of cos(\pi x) is 1/\pi. If you make your discretization much smaller than this, you won't have any issues. To be sure, this trick won't always work, so you may need to do some tinkering.

I found out it's relatively easy to implement your own root finder using the scipy.optimize.fsolve.
Idea: Find any zeroes from interval (start, stop) and stepsize step by calling the fsolve repeatedly with changing x0. Use relatively small stepsize to find all the roots.
Can only search for zeroes in one dimension (other dimensions must be fixed). If you have other needs, I would recommend using sympy for calculating the analytical solution.
Note: It may not always find all the zeroes, but I saw it giving relatively good results. I put the code also to a gist, which I will update if needed.
import numpy as np
import scipy
from scipy.optimize import fsolve
from matplotlib import pyplot as plt
# Defined below
r = RootFinder(1, 20, 0.01)
args = (90, 5)
roots = r.find(f, *args)
print("Roots: ", roots)
# plot results
u = np.linspace(1, 20, num=600)
fig, ax = plt.subplots()
ax.plot(u, f(u, *args))
ax.scatter(roots, f(np.array(roots), *args), color="r", s=10)
ax.grid(color="grey", ls="--", lw=0.5)
plt.show()
Example output:
Roots: [ 2.84599497 8.82720551 12.38857782 15.74736542 19.02545276]
zoom-in:
RootFinder definition
import numpy as np
import scipy
from scipy.optimize import fsolve
from matplotlib import pyplot as plt
class RootFinder:
def __init__(self, start, stop, step=0.01, root_dtype="float64", xtol=1e-9):
self.start = start
self.stop = stop
self.step = step
self.xtol = xtol
self.roots = np.array([], dtype=root_dtype)
def add_to_roots(self, x):
if (x < self.start) or (x > self.stop):
return # outside range
if any(abs(self.roots - x) < self.xtol):
return # root already found.
self.roots = np.append(self.roots, x)
def find(self, f, *args):
current = self.start
for x0 in np.arange(self.start, self.stop + self.step, self.step):
if x0 < current:
continue
x = self.find_root(f, x0, *args)
if x is None: # no root found.
continue
current = x
self.add_to_roots(x)
return self.roots
def find_root(self, f, x0, *args):
x, _, ier, _ = fsolve(f, x0=x0, args=args, full_output=True, xtol=self.xtol)
if ier == 1:
return x[0]
return None
Test function
The scipy.special.jnjn does not exist anymore, but I created similar test function for the case.
def f(u, V=90, ell=5):
w = np.sqrt(V ** 2 - u ** 2)
jl = scipy.special.jn(ell, u)
jl1 = scipy.special.yn(ell - 1, u)
kl = scipy.special.kn(ell, w)
kl1 = scipy.special.kn(ell - 1, w)
return jl / (u * jl1) + kl / (w * kl1)

The main problem I see with this is if you can actually find all roots --- as have already been mentioned in comments, this is not always possible. If you are sure that your function is not completely pathological (sin(1/x) was already mentioned), the next one is what's your tolerance to missing a root or several of them. Put differently, it's about to what length you are prepared to go to make sure you did not miss any --- to the best of my knowledge, there is no general method to isolate all the roots for you, so you'll have to do it yourself. What you show is a reasonable first step already. A couple of comments:
Brent's method is indeed a good choice here.
First of all, deal with the divergencies. Since in your function you have Bessels in the denominators, you can first solve for their roots -- better look them up in e.g., Abramovitch and Stegun (Mathworld link). This will be a better than using an ad hoc grid you're using.
What you can do, once you've found two roots or divergencies, x_1 and x_2, run the search again in the interval [x_1+epsilon, x_2-epsilon]. Continue until no more roots are found (Brent's method is guaranteed to converge to a root, provided there is one).
If you cannot enumerate all the divergencies, you might want to be a little more careful in verifying a candidate is indeed a divergency: given x don't just check that f(x) is large, check that, e.g. |f(x-epsilon/2)| > |f(x-epsilon)| for several values of epsilon (1e-8, 1e-9, 1e-10, something like that).
If you want to make sure you don't have roots which simply touch zero, look for the extrema of the function, and for each extremum, x_e, check the value of f(x_e).

I've also encountered this problem to solve equations like f(z)=0 where f was an holomorphic function. I wanted to be sure not to miss any zero and finally developed an algorithm which is based on the argument principle.
It helps to find the exact number of zeros lying in a complex domain. Once you know the number of zeros, it is easier to find them. There are however two concerns which must be taken into account :
Take care about multiplicity : when solving (z-1)^2 = 0, you'll get two zeros as z=1 is counting twice
If the function is meromorphic (thus contains poles), each pole reduce the number of zero and break the attempt to count them.

scipy optimize - fmin Nelder-Mead simplex

I'm trying to use the scipy Nelder-Mead simplex search function to find a minimum to a non-linear function. It appears my simplex gets stuck because it starts off with an initial simplex that is too small. Unfortunately, I don't see anywhere in scipy where you can change some of the simplex parameters (e.g. initial simplex size). Is there a way? Am I missing something? Or are there other implementations of the NM simplex?
Thanks

Two suggestions for Nelder-Mead:
1) snap all x to a grid, say .01, inside the function:
x = np.round( x / grid ) * grid
f = ...
This acts as a simple noise filter in high dimensions
(in 2d or 3d, don't bother).
2) start off with the best d+1 of 2d+1 nearby points,
instead of the usual d+1:
def neard1( func, x, h, verbose=1 ):
""" eval func at 2d+1 points x, x +- h
sort
-> f[ d+1 best values ], X[ d+1 ]
to start or restart Nelder-Mead
"""
dim = len(x)
I = np.eye(dim)
np.fill_diagonal( I, h ) # scalar or vec
X = x + np.vstack(( np.zeros(dim), I, - I ))
fnear = np.array([ func( x ) for x in X ]) # 2d+1
f0 = fnear[0]
up = np.argsort( fnear ) # vec func: |fnear|
if verbose:
print "neard1: f %g +- %s around x %s" % (
f0, fnear[up] - f0, x )
bestd1 = up[:dim+1]
return fnear[bestd1], X[bestd1]
It's also not a bad idea to look at the neard1() values after Nelder-Mead,
to get an idea of what func() looks like there.
If any neighbors are better then the N-M "best", restart N-M from that new simplex.
(One can alternate neard1, N-M, neard1, N-M: easy but very problem-dependent.)
How many variables do you have, and how noisy is your function ?
Hope this helps

From the reference at http://docs.scipy.org/doc/:
Method Nelder-Mead uses the Simplex algorithm [R123], [R124]. This algorithm has been successful in many applications but other algorithms using the first and/or second derivatives information might be preferred for their better performances and robustness in general.
It may be recommended to use a completely different algorithm, then. Note that:
Method BFGS uses the quasi-Newton method of Broyden, Fletcher, Goldfarb, and Shanno (BFGS) [R127] pp. 136. It uses the first derivatives only. BFGS has proven good performance even for non-smooth optimizations. This method also returns an approximation of the Hessian inverse, stored as hess_inv in the OptimizeResult object.
BFGS sounds more robust and faster overall.
ParagonRG

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.