curve_fit from scipy generates wrong values - python

I have several data from measurements, and I wanted to fit my own function onto this data. For that I wanted to use curve_fit. Therefore I implemented it in the following way:
def func(self, w, H, T, L):
B = (L + 0.25)*((2 * math.pi / T)** 4)
S = (B**L)*H*H / (4 * scipy.special.gamma(L)*(w**( 4 * L + 1)))*numpy.exp(-B / (w*w*w*w))
return S
def fit_this(self):
self.popt, self.copt = curve_fit(self.func, self.xdata, self.ydata)
print self.popt
print self.copt
self.xdata and self.ydata contain my measured values. I already did the fit with the same function in another program (matlab) and therefore I already know working parameters. But when start this program, I get for popt definitely wrong parameters, and the covariance of the values is somewhere in the range of 10e+22, and therefore completely useless. Why is that happening (Guess: Wrong starting values), and how can I improve this situation? If providing my data helps I can provide them, too.

Related

Creating a numpy array from repeating a function n times with random behavior inside the function without for loops

I am doing optimizations in my code and would like to remove this for loop if possible. I have the following code:
def f(x):
return np.pi * x * np.cos((np.pi / 2) * x ** 2)
def monte_carlo(b, n):
xrand = np.random.uniform(0, b, (1, n))
integral = np.sum(f(xrand))
return b/n * integral
data = np.array([monte_carlo(b, n) for _ in range(100)])
In this code b is the upper limit (usually between 0 and 16) and n is the number of samples used (usually between 10E4 and 10E7). The monte carlo function is used for a basic monte carlo integration of the integral between 0 and b of the function f(x).
What I'm trying to do is to remove the for loop in data. I know it's not really a performance issue, but my assignment asks to try and do performance optimizations. I'm using a comprehension right now which is a little faster than a normal for loop but I think there's more optimization that can be done, pythons for loops are rather slow so I'm trying to get rid of them first.
I've looked at functions like numpy.repeat and numpy.tile but they don't work since that only runs the function once and then keeps repeating that result of the function. I've also looked at numpy.fromfunction, but I can't seem to get that to work and don't think I can use that. The documentation on numpy.fromfunction says "Construct an array by executing a function over each coordinate". I don't think that's particularly useful since the arguments for the functions stay the same. It's the random element inside the function that creates the difference in each iteration.
If anyone knows how I can solve this that would be very much appreciated, or has a different numpy function I didn't find during my google search that does what I need.
Thanks in advance!
You can basically do all the 100 iterations or how many ever you want in a single shot. See the code below-
def f(x):
return np.pi * x * np.cos((np.pi / 2) * x ** 2)
def monte_carlo(b, n, N):
xrand = np.random.uniform(0, b, (N, n))
integral = np.sum(f(xrand), axis=1)
return b / n * integral
# data = np.array([monte_carlo(b, n) for _ in range(100)])
data = monte_carlo(b, n, 100).T

Unknown error with self-defined function for approximation of an integral

I've defined the following function as a method of approximating an integral using Boole's Rule:
def integrate_boole(f,l,r,N):
h=((r-l)/N)
xN = np.linspace(l,r,N+1)
fN = f(xN)
return ((2*h)/45)*(7*fN[0]+32*(np.sum(fN[1:-2:2]))+12*(np.sum(fN[2:-3:4]))+14*(np.sum(fN[4:-5]))+7*fN[-1])
I used the function to get the value of the integral for sin(x)dx between 0 and pi (where N=8) and assigned it to a variable sine_int.
The answer given was 1.3938101893248442
After doing the original equation (see here) out by hand I realised this answer was quite inaccurate.
The sums of fN are giving incorrect values, but I'm not sure why. For example, np.sum(fN[4:-5]) is going to 0.
Is there a better way of coding the sums involved, or is there an error in my parameters that's causing the calculations to be inaccurate?
Thanks in advance.
EDIT
I should have made it clearer that this is supposed to be a composite version of the rule, i.e. approximating over N points where N is divisible by 4. So the typical 5 points with 4 intervals isn't going to cut it here, unfortunately. I would copy the equation I'm using into here, but I don't have an image of it and LaTex isn't an option. It should/might be clear from the code I have after return.
From a quick inspection looks like the term multiplying f(x_4) should be 32, not 14:
def integrate_boole(f,l,r,N):
h=((r-l)/N)
xN = np.linspace(l,r,N+1)
fN = f(xN)
return ((2*h)/45)*(7*fN[0]+32*(np.sum(fN[1:-2:2]))+
12*(np.sum(fN[2:-3:4]))+32*(np.sum(fN[4:-5]))+7*fN[-1])
First, one of your coefficients was wrong as pointed out by #nixon. Then, I think you do not really understand how the Boole's rule works - It approximates the integral of a function only using 5 points of the function. Hence, the terms like np.sum(fN[1:-2:2]) makes no sense. You only need five points, which you can obtain with xN = np.linspace(l,r,5). Your h is simply the distance between 2 of the contiguos points h = xN[1] - xN[0]. And then, easy peasy:
import numpy as np
def integrate_boole(f,l,r):
xN = np.linspace(l,r,5)
h = xN[1] - xN[0]
fN = f(xN)
return ((2*h)/45)*(7*fN[0]+32*fN[1]+12*fN[2]+32*fN[3]+7*fN[4])
def f(x):
return np.sin(x)
I = integrate_boole(f, 0, np.pi)
print(I) # Outputs 1.99857...
I'm not sure what you're hoping your code does w.r.t. Boole's rule. Why are you summing over samples of the function (i.e. np.sum(fN[2:-3:4]))? I think your N parameter is also not well defined and I'm not sure what it's supposed to represent. Maybe you're using another rule I'm not familiar with: I'll let you decide.
Regardless, here's an implementation of Boole's rule as Wikipedia defines it. Variables map to the Wikipedia version you linked:
def integ_boole(func, left, right):
h = (right - left) / 4
x1 = left
x2 = left + h
x3 = left + 2*h
x4 = left + 3*h
x5 = right # or left + 4h
result = (2*h / 45) * (7*func(x1) + 32*func(x2) + 12*func(x3) + 32*func(x4) + 7*func(x5))
return result
then, to test:
import numpy as np
print(integ_boole(np.sin, 0, np.pi))
outputs 1.9985707318238357, which is extremely close to the correct answer of 2.
HTH.

Calculating Beta Binomial likelihood with n>1000

I'm struggling with a numerical precision issue in calculating the beta binomial likelihood. My goal is to estimate the probability y mod 10 = d for some digit d, given that y is Binomial(n,p) and p is Beta(a,b). I'm trying to come up with a fast solution for large n, by which I mean at least 1000. One thing that seems to be giving me reasonable answers is to use simulation.
def npliketest_exact(n,digit,a,b):
#draw 1000 values of p
probs = np.array(beta.rvs(a,b,size=1000))
#create an array of numbers whose last digit is digit
digits = np.arange(digit,n+1,10)
#create a function that calculates the pmf at x given p
exact_func = lambda x,p: binom(n,p).pmf(x)
#given p, the likeklihood of last digit "digit" is the sum over all entries in digits
likelihood = lambda p: exact_func(digits,p).sum()
#return the average of that likelihood over all the draws
return np.vectorize(likelihood)(probs).mean()
np.random.seed(1)
print npliketest_exact(1000,9,1,1) #0.0992310195195
This might be ok, but I'm worried about the precision of this strategy. In particular if there's a better/more precise way to do this calculation I'm eager to figure out how to do it.
I've started trying to use the log likelihood to come up with that answer, but I'm running into numerical stability issues even with that.
def llike(n,k,a,b):
out = gammaln(n+1) + gammaln(k+a) + gammaln(n-k+b) + gammaln(a+b) - \
( gammaln(k+1) + gammaln(n-k+1) + gammaln(a) + gammaln(b) + gammaln(n+a+b) )
return out
print exp(llike(1000,9,1,1)) #.000999000999001
print exp(llike(1000,500,1,1)) #.000999000999001
Since the beta 1,1 has mean 0.5, the probability of getting y=500 from a beta binomial with n=1000 should be much higher than getting 9, but the above calculations show a suspicious constant value.
Another thing I tried, which was suggested elsewhere on stackoverflow to deal with this problem, was to use some clever tricks to support numerical stability that are apparently hidden in scipy's betaln formula.
def binomln(n, k): #log of the binomial coefficient
# Assumes binom(n, k) >= 0
return -betaln(1 + n - k, 1 + k) - log(n + 1)
def log_betabinom_exact(n,k,a,b):
return binomln(n,k) + betaln(k+a,n-k+b) - betaln(a,b)
print exp(log_betabinom_exact(1000,9,1,1)) #.000999000999001
print exp(log_betabinom_exact(1000,500,1,1)) #0.000999000999001
Again, same suspicious constants. Would appreciate any advice. Would using sympy be of any help on this?
**** Followup
Sorry guys, dumb mistake on my part, Beta(1,1) is uniform so the results I was getting make sense. Trying different parameters makes things look differnet for different values of k.

Bound solutions in scipy ode solver

I want to solve a high amount of bilinear ODE systems in python. The derivative is this:
def x_(x, t, growth, connections):
return x * growth + np.dot(connections, x) * x
I am not interested in very accurate results but in the qualitative behavior, i.e. whether a component goes to zero or not.
Because I have to solve such a big quantity of high-deminsional systems, I want to use a step size as big as possible.
Due to big step sizes it can happen that the ODE goes in one component below zero. This should not be possible since (because of the structure of the particular ODE) each component is bounded by zero. Hence - to prevent wrong results - I would like to set each component manually to zero once it is below.
Furtherly, in the systems that I want to solve it can happen that solutions blow up. I want to prevent this by setting an upper bound as well, i.e. if a value exceeds the bound it is set back to the value of the bound.
I hope I can make my goal understandable giving you the following pseudo-code of what I want to do:
for t in range(0, tEnd, dt):
$ compute x(t) using x(t-dt) $
x(t) = np.minimum(np.maximum(x(t), 0), upperBound)
I implemented this using a Runge-Kutta algorithm. Everything works fine. Just the performance is bad. Therefore, I would prefer using a pre-implemented method like scipy.integrate.odeint.
Thereby, I have no idea on how to set such bounds. An option that I tried was to manipulate the ODE that way, that the derivative becomes 0 once x is above the bound, and (positive) one once x is below 0. In addition, to prevent too high jumps within one timestep, I also bounded the derivative:
def x_(x, t, growth, connections, bound):
return (x > 0) * np.minimum((x < bound) * \
( x * growth + np.dot(connections, x) * x ), bound) + (x < 0)
Though this solution (especially for the zero-bound) is very ugly it would be sufficient if it worked. Unfortunately, it does not work. Using odeint
x = scipy.integrate.odeint(x_, x0, timesteps, param)
I get very often one of these two errors:
Repeated convergence failures (perhaps bad Jacobian or tolerances).
Excess work done on this call (perhaps wrong Dfun type).
They may be due to the discontinuities of my manipulated ODE. There are plenty of threads about these error messages on the internet but they did not help me. E.g. increasing the amount of allowed steps did neither prevent this issue nor is it a good solution for me since I need to use big step sizes. Furtherly, passing the Jacobian did not help either.
Having a look onto the solutions one can see that two types of strange behavior happen when the errors occure:
The solution blows in one single time-step up to +-1e250 (that should be impossible since dx/dt is bounded).
It first reaches the bound but goes down again (that should be impossible because x is at the bound and therefore x_ is 0).
I would appreciate all hints on how to solve the issue - no matter whether it is help on
how to prevent the errors in odeint
how to manipulate the ODE properly or on
how to write a very fast ODE solver where I can directly implement my needs.
I thank you in advance!
Edit
I was asked for a minimal example:
import numpy as np
import random as rd
rd.seed()
import scipy.integrate
def simulate(simParam, dim = 20, connectivity = .8, conRange = 1, threshold = 1E-3,
constGrowing=None):
"""
Creates the random system matrix and starts a simulation
"""
x0 = np.zeros(dim, dtype='float') + 1
connections = np.zeros(shape=(dim, dim), dtype='float')
growth = np.zeros(dim, dtype='float') +
(constGrowing if not constGrowing == None else 0)
for i in range(dim):
for j in range(dim):
if i != j:
connections[i][j] = rd.uniform(-conRange, conRange)
tend, step = simParam
return RK4NumPy(x_, (growth, connections), x0, 0, tend, step)
def x_(x, t, growth, connections, bound):
"""
Derivative of the ODE
"""
return (x > 0) * np.minimum((x < bound) *
(x * growth + np.dot(connections, x) * x), bound) + (x < 0)
def RK4NumPy(x_, param, x0, t0, tend, step, maxV = 1.0E2, silent=True):
"""
solving method
"""
param = param + (maxV,)
timesteps = np.arange(t0 + step, tend, step)
return scipy.integrate.odeint(x_, x0, timesteps, param)
simulate((300, 0.5))
To see the solution one would have to plot x. With the given parameters I get very often the above mentioned error
Excess work done on this call (perhaps wrong Dfun type).
Run with full_output = 1 to get quantitative information.

How to find all zeros of a function using numpy (and scipy)?

Suppose I have a function f(x) defined between a and b. This function can have many zeros, but also many asymptotes. I need to retrieve all the zeros of this function. What is the best way to do it?
Actually, my strategy is the following:
I evaluate my function on a given number of points
I detect whether there is a change of sign
I find the zero between the points that are changing sign
I verify if the zero found is really a zero, or if this is an asymptote
U = numpy.linspace(a, b, 100) # evaluate function at 100 different points
c = f(U)
s = numpy.sign(c)
for i in range(100-1):
if s[i] + s[i+1] == 0: # oposite signs
u = scipy.optimize.brentq(f, U[i], U[i+1])
z = f(u)
if numpy.isnan(z) or abs(z) > 1e-3:
continue
print('found zero at {}'.format(u))
This algorithm seems to work, except I see two potential problems:
It will not detect a zero that doesn't cross the x axis (for example, in a function like f(x) = x**2) However, I don't think it can occur with the function I'm evaluating.
If the discretization points are too far, there could be more that one zero between them, and the algorithm could fail finding them.
Do you have a better strategy (still efficient) to find all the zeros of a function?
I don't think it's important for the question, but for those who are curious, I'm dealing with characteristic equations of wave propagation in optical fiber. The function looks like (where V and ell are previously defined, and ell is an positive integer):
def f(u):
w = numpy.sqrt(V**2 - u**2)
jl = scipy.special.jn(ell, u)
jl1 = scipy.special.jnjn(ell-1, u)
kl = scipy.special.jnkn(ell, w)
kl1 = scipy.special.jnkn(ell-1, w)
return jl / (u*jl1) + kl / (w*kl1)
Why are you limited to numpy? Scipy has a package that does exactly what you want:
http://docs.scipy.org/doc/scipy/reference/optimize.nonlin.html
One lesson I've learned: numerical programming is hard, so don't do it :)
Anyway, if you're dead set on building the algorithm yourself, the doc page on scipy I linked (takes forever to load, btw) gives you a list of algorithms to start with. One method that I've used before is to discretize the function to the degree that is necessary for your problem. (That is, tune \delta x so that it is much smaller than the characteristic size in your problem.) This lets you look for features of the function (like changes in sign). AND, you can compute the derivative of a line segment (probably since kindergarten) pretty easily, so your discretized function has a well-defined first derivative. Because you've tuned the dx to be smaller than the characteristic size, you're guaranteed not to miss any features of the function that are important for your problem.
If you want to know what "characteristic size" means, look for some parameter of your function with units of length or 1/length. That is, for some function f(x), assume x has units of length and f has no units. Then look for the things that multiply x. For example, if you want to discretize cos(\pi x), the parameter that multiplies x (if x has units of length) must have units of 1/length. So the characteristic size of cos(\pi x) is 1/\pi. If you make your discretization much smaller than this, you won't have any issues. To be sure, this trick won't always work, so you may need to do some tinkering.
I found out it's relatively easy to implement your own root finder using the scipy.optimize.fsolve.
Idea: Find any zeroes from interval (start, stop) and stepsize step by calling the fsolve repeatedly with changing x0. Use relatively small stepsize to find all the roots.
Can only search for zeroes in one dimension (other dimensions must be fixed). If you have other needs, I would recommend using sympy for calculating the analytical solution.
Note: It may not always find all the zeroes, but I saw it giving relatively good results. I put the code also to a gist, which I will update if needed.
import numpy as np
import scipy
from scipy.optimize import fsolve
from matplotlib import pyplot as plt
# Defined below
r = RootFinder(1, 20, 0.01)
args = (90, 5)
roots = r.find(f, *args)
print("Roots: ", roots)
# plot results
u = np.linspace(1, 20, num=600)
fig, ax = plt.subplots()
ax.plot(u, f(u, *args))
ax.scatter(roots, f(np.array(roots), *args), color="r", s=10)
ax.grid(color="grey", ls="--", lw=0.5)
plt.show()
Example output:
Roots: [ 2.84599497 8.82720551 12.38857782 15.74736542 19.02545276]
zoom-in:
RootFinder definition
import numpy as np
import scipy
from scipy.optimize import fsolve
from matplotlib import pyplot as plt
class RootFinder:
def __init__(self, start, stop, step=0.01, root_dtype="float64", xtol=1e-9):
self.start = start
self.stop = stop
self.step = step
self.xtol = xtol
self.roots = np.array([], dtype=root_dtype)
def add_to_roots(self, x):
if (x < self.start) or (x > self.stop):
return # outside range
if any(abs(self.roots - x) < self.xtol):
return # root already found.
self.roots = np.append(self.roots, x)
def find(self, f, *args):
current = self.start
for x0 in np.arange(self.start, self.stop + self.step, self.step):
if x0 < current:
continue
x = self.find_root(f, x0, *args)
if x is None: # no root found.
continue
current = x
self.add_to_roots(x)
return self.roots
def find_root(self, f, x0, *args):
x, _, ier, _ = fsolve(f, x0=x0, args=args, full_output=True, xtol=self.xtol)
if ier == 1:
return x[0]
return None
Test function
The scipy.special.jnjn does not exist anymore, but I created similar test function for the case.
def f(u, V=90, ell=5):
w = np.sqrt(V ** 2 - u ** 2)
jl = scipy.special.jn(ell, u)
jl1 = scipy.special.yn(ell - 1, u)
kl = scipy.special.kn(ell, w)
kl1 = scipy.special.kn(ell - 1, w)
return jl / (u * jl1) + kl / (w * kl1)
The main problem I see with this is if you can actually find all roots --- as have already been mentioned in comments, this is not always possible. If you are sure that your function is not completely pathological (sin(1/x) was already mentioned), the next one is what's your tolerance to missing a root or several of them. Put differently, it's about to what length you are prepared to go to make sure you did not miss any --- to the best of my knowledge, there is no general method to isolate all the roots for you, so you'll have to do it yourself. What you show is a reasonable first step already. A couple of comments:
Brent's method is indeed a good choice here.
First of all, deal with the divergencies. Since in your function you have Bessels in the denominators, you can first solve for their roots -- better look them up in e.g., Abramovitch and Stegun (Mathworld link). This will be a better than using an ad hoc grid you're using.
What you can do, once you've found two roots or divergencies, x_1 and x_2, run the search again in the interval [x_1+epsilon, x_2-epsilon]. Continue until no more roots are found (Brent's method is guaranteed to converge to a root, provided there is one).
If you cannot enumerate all the divergencies, you might want to be a little more careful in verifying a candidate is indeed a divergency: given x don't just check that f(x) is large, check that, e.g. |f(x-epsilon/2)| > |f(x-epsilon)| for several values of epsilon (1e-8, 1e-9, 1e-10, something like that).
If you want to make sure you don't have roots which simply touch zero, look for the extrema of the function, and for each extremum, x_e, check the value of f(x_e).
I've also encountered this problem to solve equations like f(z)=0 where f was an holomorphic function. I wanted to be sure not to miss any zero and finally developed an algorithm which is based on the argument principle.
It helps to find the exact number of zeros lying in a complex domain. Once you know the number of zeros, it is easier to find them. There are however two concerns which must be taken into account :
Take care about multiplicity : when solving (z-1)^2 = 0, you'll get two zeros as z=1 is counting twice
If the function is meromorphic (thus contains poles), each pole reduce the number of zero and break the attempt to count them.

Categories

Resources