Background:
I am trying to implement a function doing an inverse transform sampling. I use sympy for calculating CDF and getting its inverse function. While for some simple PDFs I get correct results, for a PDF which CDF's inverse function includes Lambert-W function, results are wrong.
Example:
Consider following example CDF:
import sympy as sym
y = sym.Symbol('y')
cdf = (-y - 1) * sym.exp(-y) + 1 # derived from `pdf = x * sym.exp(-x)`
sym.plot(cdf, (y, -1, 5))
Now calculating inverse of this function:
x = sym.Symbol('x')
inverse = sym.solve(sym.Eq(x, cdf), y)
print(inverse)
Output:
[-LambertW((x - 1)*exp(-1)) - 1]
This, in fact, is only a left branch of negative y's of a given CDF:
sym.plot(inverse[0], (x, -0.5, 1))
Question:
How can I get the right branch for positive y's of a given CDF?
What I tried:
Specifying x and y to be only positive:
x = sym.Symbol('x', positive=True)
y = sym.Symbol('y', positive=True)
This doesn't have any effect, even for the first CDF plot.
Making CDF a Piecewise function:
cdf = sym.Piecewise((0, y < 0),
((-y - 1) * sym.exp(-y) + 1, True))
Again no effect. Strange thing here is that on another computer plotting this function gave a proper graph with zero for negative y's, but solving for a positive y's branch doesn't work anywhere. (Different versions? I also had to specify adaptive=False to sympy.plot to make it work there.)
Using sympy.solveset instead of sympy.solve:
This just gives a useless ConditionSet(y, Eq(x*exp(y) + y - exp(y) + 1, 0), Complexes(S.Reals x S.Reals, False)) as a result. Apparently, solveset still doesn't know how to deal with LambertW functions. From the docs:
When cases which are not solved or can only be solved incompletely, a
ConditionSet is used and acts as an unevaluated solveset object. <...>
There are still a few things solveset can’t do, which the old solve
can, such as solving non linear multivariate & LambertW type
equations.
Is it a bug or am I missing something? Is there any workaround to get the desired result?
The inverse produced by sympy is almost correct. The problem lies in the fact that the LambertW function has multiple branches over the domain (-1/e, 0). By default, it uses the upper branch, however for your problem you require the lower branch. The lower branch can be accessed by passing in a second argument to LambertW with a value of -1.
inverse = -sym.LambertW((x - 1)*sym.exp(-1), -1) - 1
sym.plot(inverse, (x, 0, 0.999))
Gives
Related
I'm hoping to just get some assistance conceptually about how to solve a linear system of equations with penalty functions. Example code is at the bottom.
Let's say I'm trying trying to do a fit of this equation:
Ie=IaX+IbY+IcZ
where Ie, Ia, Ib, and Ic are constants, and X,Y,Z are variables
I could easily solve this system of equations using scipy.least_squares, but I want to constrain the system with 2 contraints.
1. X+Y+Z=1
2. X,Y,Z > 0 and X,Y,Z < 1
To do this, I then modified the above function.
Ie-Ic=X(Ia-Ic)+Y(Ib-Ic) where X+Y+Z=1 I solved for Z
therefore
Ax=B where A=[Ia-Ic,Ib-Ic] and B=[Ie-Ic] given bounds (0,1)
This solves the 2nd criteria of X,Y,Z > 0 and X,Y,Z < 1, but it does not solve the 1st.
To resolve the first issue, an additional constraint needs to be made, where X+Y<1, and this I don't quite know how to do.
So I presume least_squares has some built in penalty function for it's bounds. I.E.
chi^2=||A-Bx||^2+P
where P is the conditions, if X,Y,Z>0 then P = 10000
thus, giving high chi squared values and thus setting the constraints
I don't know how I can add a further condition. So if X+Y<1 then P=10000 or something similar along those lines.
In short, least_squares enables you to set bounds on the individual values, but I'd like to set some further constraints, and I don't quite know how to do this with least_squares. I've seen additional inequality constraint options in scipy.minimize, but I don't quite know how to apply that to a linear system of equations with the format Ax=B.
So as an example code, lets say I've already done the calculations and obtained my A matrix of constants and my B vector. I use least squares, get my values for X and Y, and can calculate Z since X+Y+Z=1. The issue here is in my minimization, I did not set a constraint that X+Y<1, so in some cases you can actually get values where X+Y>1. So I'd like to find a method where I can set that additional constraint, in addition to the bounds constraint for the individual variables:
Ax=np.array([[1,2],[2,4],[3,4]])
B=np.array([0,1,2])
solution=lsq_linear(Ax,B,lsq_solver='lsmr',bounds=(0,1))
X=solution.x[0]
Y=solution.x[1]
Z=1-sum(solution.x)
If minimize is the solution here, can you please show me how to set it up given the above matrix of A and array of B?
Any advice, tips, or help to point me in the right direction is greatly appreciated!
Edit:
So I found something similar on here: Minimizing Least Squares with Algebraic Constraints and Bounds
So I thought I'd apply it to my case, but I don't think I've been able to apply it properly.
Ax=np.array([[1,2],[2,4],[3,4]])
B=np.array([0,1,2])
def fun(x,a1,a2,y):
fun_output=x[0]*a1+x[1]*a2
return np.sum((fun_output-y)**2)
cons = [{"type": "eq", "fun": lambda x: x[0] + x[1] - 1}]
bnds = [(0, 1), (0, 1)]
xinit = np.array([1, 1])
solution=minimize(fun,args=(Ax[:,0],Ax[:,1], B), x0=xinit, bounds=bnds, constraints=cons)
solution_2=lsq_linear(Ax,B,bounds=(0,1))
print(solution.x)
print(solution_2.x)
Issue is, the output of this differs from lsq_linear, and I almost always get a very close to zero value for Z regardless of what the input arrays are. I don't know if I'm setting this up/understanding this correctly.
Your initial guess xinit is not feasible and doesn't satisfy your constraint.
IMO, solving the initial problem directly as a constrained nonlinear optimization problem (NLP) instead of rewriting it is the easier approach. Assuming you have all the data points Ia, Ib, Ic and Ie (you didn't provide all of them), you can use the following code snippet which is in the same vein as the linked answer of mine in your question.
from scipy.optimize import minimize
import numpy as np
def fun_to_fit(coeffs, *args):
x, y, z = coeffs
Ia, Ib, Ic = args
return Ia*x + Ib*y + Ic*z
def objective(coeffs, *args):
Ia, Ib, Ic, Ie = args
residual = Ie - fun_to_fit(coeffs, Ia, Ib, Ic)
return np.sum(residual**2)
# Constraint: x + y + z == 1
con = [{'type': 'eq', 'fun': lambda coeffs: np.sum(coeffs) - 1}]
# bounds
bounds = [(0, 1), (0, 1), (0, 1)]
# initial guess (fulfils the constraint and lies within the bounds)
x0 = np.array([0.25, 0.5, 0.25])
# your given data points
#Ia = np.array(...)
#Ib = np.array(...)
#Ic = np.array(...)
#Ie = np.array(...)
# solve the NLP
res = minimize(lambda coeffs: objective(coeffs, Ia, Ib, Ic, Ie), x0=x0, bounds=bounds, constraint=con)
There is a function which determine the intensity of the Fraunhofer diffraction pattern of a circular aperture... (more information)
Integral of the function in distance x= [-3.8317 , 3.8317] must be about 83.8% ( If assume that I0 is 100) and when you increase the distance to [-13.33 , 13.33] it should be about 95%.
But when I use integral in python, the answer is wrong.. I don't know what's going wrong in my code :(
from scipy.integrate import quad
from scipy import special as sp
I0=100.0
dist=3.8317
I= quad(lambda x:( I0*((2*sp.j1(x)/x)**2)) , -dist, dist)[0]
print I
Result of the integral can't be bigger than 100 (I0) because this is the diffraction of I0 ... I don't know.. may be scaling... may be the method! :(
The problem seems to be in the function's behaviour near zero. If the function is plotted, it looks smooth:
However, scipy.integrate.quad complains about round-off errors, which is very strange with this beautiful curve. However, the function is not defined at 0 (of course, you are dividing by zero!), hence the integration does not go well.
You may use a simpler integration method or do something about your function. You may also be able to integrate it to very close to zero from both sides. However, with these numbers the integral does not look right when looking at your results.
However, I think I have a hunch of what your problem is. As far as I remember, the integral you have shown is actually the intensity (power/area) of Fraunhofer diffraction as a function of distance from the center. If you want to integrate the total power within some radius, you will have to do it in two dimensions.
By simple area integration rules you should multiply your function by 2 pi r before integrating (or x instead of r in your case). Then it becomes:
f = lambda(r): r*(sp.j1(r)/r)**2
or
f = lambda(r): sp.j1(r)**2/r
or even better:
f = lambda(r): r * (sp.j0(r) + sp.jn(2,r))
The last form is best as it does not suffer from any singularities. It is based on Jaime's comment to the original answer (see the comment below this answer!).
(Note that I omitted a couple of constants.) Now you can integrate it from zero to infinity (no negative radii):
fullpower = quad(f, 1e-9, np.inf)[0]
Then you can integrate from some other radius and normalize by the full intensity:
pwr = quad(f, 1e-9, 3.8317)[0] / fullpower
And you get 0.839 (which is quite close to 84 %). If you try the farther radius (13.33):
pwr = quad(f, 1e-9, 13.33)
which gives 0.954.
It should be noted that we introduce a small error by starting the integration from 1e-9 instead of 0. The magnitude of the error can be estimated by trying different values for the starting point. The integration result changes very little between 1e-9 and 1e-12, so they seem to be safe. Of course, you could use, e.g., 1e-30, but then there may be numerical instability in the division. (In this case there isn't, but in general singularities are numerically evil.)
Let us do one thing still:
import matplotlib.pyplot as plt
import numpy as np
x = linspace(0.01, 20, 1000)
intg = np.array([ quad(f, 1e-9, xx)[0] for xx in x])
plt.plot(x, intg/fullpower)
plt.grid('on')
plt.show()
And this is what we get:
At least this looks right, the dark fringes of the Airy disk are clearly visible.
What comes to the last part of the question: I0 defines the maximum intensity (the units may be, e.g. W/m2), whereas the integral gives total power (if the intensity is in W/m2, the total power is in W). Setting the maximum intensity to 100 does not guarantee anything about the total power. That is why it is important to calculate the total power.
There actually exists a closed form equation for the total power radiated onto a circular area:
P(x) = P0 ( 1 - J0(x)^2 - J1(x)^2 ),
where P0 is the total power.
Note that you also can get a closed form solution for your integration using Sympy:
import sympy as sy
sy.init_printing() # LaTeX like pretty printing in IPython
x,d = sy.symbols("x,d", real=True)
I0=100
dist=3.8317
f = I0*((2*sy.besselj(1,x)/x)**2) # the integrand
F = f.integrate((x, -d, d)) # symbolic integration
print(F.evalf(subs={d:dist})) # numeric evalution
F evaluates to:
1600*d*besselj(0, Abs(d))**2/3 + 1600*d*besselj(1, Abs(d))**2/3 - 800*besselj(1, Abs(d))**2/(3*d)
with besselj(0,r) corresponding to sp.j0(r).
They might be a singularity in the integration algorithm when doing the jacobian at x = 0. You can exclude this points from the integration with "points":
f = lambda x:( I0*((2*sp.j1(x)/x)**2))
I = quad(f, -dist, dist, points = [0])
I get then the following result (is this your desired result?)
331.4990321315221
I'm solving the integral numerically using python:
where a(x) can take on any value; positive, negative, inside or outside the the [-1;1] and eta is an infinitesimal positive quantity. There is a second outer integral of which changes the value of a(x)
I'm trying to solve this using the Sokhotski–Plemelj theorem:
However this involves determining the principle value, which I can't find any method to in python. I know it's implemented in Matlab, but does anyone know of either a library or some other way of the determining the principal value in python (if a principle value exists)?
You can use sympy to evaluate the integral directly. Its real part with eta->0 is the principal value:
from sympy import *
x, y, eta = symbols('x y eta', real=True)
re(integrate(1/(x - y + I*eta), (x, -1, 1))).simplify().subs({eta: 0})
# -> log(Abs(-y + 1)/Abs(y + 1))
Matlab's symbolic toolbox int gives you the same result, of course (I'm not aware of other relevant tools in Matlab for this --- please specify if you know a specific one).
You asked about numerical computation of a principal value. The answer there is that if you only have a function f(y) whose analytical form or behavior you don't know, it's in general impossible to compute them numerically. You need to know things such as where the poles of the integrand are and what order they are.
If you on the other hand know your integral is of the form f(y) / (y - y_0), scipy.integrate.quad can compute the principal value for you, for example:
import numpy as np
from scipy import integrate, special
# P \int_{-1}^1 dx 1/(x - wvar) * (1 + sin(x))
print(integrate.quad(lambda x: 1 + np.sin(x), -1, 1, weight='cauchy', wvar=0))
# -> (1.8921661407343657, 2.426947531830592e-13)
# Check against known result
print(2*special.sici(1)[0])
# -> 1.89216614073
See here for details.
Suppose I have a function f(x) defined between a and b. This function can have many zeros, but also many asymptotes. I need to retrieve all the zeros of this function. What is the best way to do it?
Actually, my strategy is the following:
I evaluate my function on a given number of points
I detect whether there is a change of sign
I find the zero between the points that are changing sign
I verify if the zero found is really a zero, or if this is an asymptote
U = numpy.linspace(a, b, 100) # evaluate function at 100 different points
c = f(U)
s = numpy.sign(c)
for i in range(100-1):
if s[i] + s[i+1] == 0: # oposite signs
u = scipy.optimize.brentq(f, U[i], U[i+1])
z = f(u)
if numpy.isnan(z) or abs(z) > 1e-3:
continue
print('found zero at {}'.format(u))
This algorithm seems to work, except I see two potential problems:
It will not detect a zero that doesn't cross the x axis (for example, in a function like f(x) = x**2) However, I don't think it can occur with the function I'm evaluating.
If the discretization points are too far, there could be more that one zero between them, and the algorithm could fail finding them.
Do you have a better strategy (still efficient) to find all the zeros of a function?
I don't think it's important for the question, but for those who are curious, I'm dealing with characteristic equations of wave propagation in optical fiber. The function looks like (where V and ell are previously defined, and ell is an positive integer):
def f(u):
w = numpy.sqrt(V**2 - u**2)
jl = scipy.special.jn(ell, u)
jl1 = scipy.special.jnjn(ell-1, u)
kl = scipy.special.jnkn(ell, w)
kl1 = scipy.special.jnkn(ell-1, w)
return jl / (u*jl1) + kl / (w*kl1)
Why are you limited to numpy? Scipy has a package that does exactly what you want:
http://docs.scipy.org/doc/scipy/reference/optimize.nonlin.html
One lesson I've learned: numerical programming is hard, so don't do it :)
Anyway, if you're dead set on building the algorithm yourself, the doc page on scipy I linked (takes forever to load, btw) gives you a list of algorithms to start with. One method that I've used before is to discretize the function to the degree that is necessary for your problem. (That is, tune \delta x so that it is much smaller than the characteristic size in your problem.) This lets you look for features of the function (like changes in sign). AND, you can compute the derivative of a line segment (probably since kindergarten) pretty easily, so your discretized function has a well-defined first derivative. Because you've tuned the dx to be smaller than the characteristic size, you're guaranteed not to miss any features of the function that are important for your problem.
If you want to know what "characteristic size" means, look for some parameter of your function with units of length or 1/length. That is, for some function f(x), assume x has units of length and f has no units. Then look for the things that multiply x. For example, if you want to discretize cos(\pi x), the parameter that multiplies x (if x has units of length) must have units of 1/length. So the characteristic size of cos(\pi x) is 1/\pi. If you make your discretization much smaller than this, you won't have any issues. To be sure, this trick won't always work, so you may need to do some tinkering.
I found out it's relatively easy to implement your own root finder using the scipy.optimize.fsolve.
Idea: Find any zeroes from interval (start, stop) and stepsize step by calling the fsolve repeatedly with changing x0. Use relatively small stepsize to find all the roots.
Can only search for zeroes in one dimension (other dimensions must be fixed). If you have other needs, I would recommend using sympy for calculating the analytical solution.
Note: It may not always find all the zeroes, but I saw it giving relatively good results. I put the code also to a gist, which I will update if needed.
import numpy as np
import scipy
from scipy.optimize import fsolve
from matplotlib import pyplot as plt
# Defined below
r = RootFinder(1, 20, 0.01)
args = (90, 5)
roots = r.find(f, *args)
print("Roots: ", roots)
# plot results
u = np.linspace(1, 20, num=600)
fig, ax = plt.subplots()
ax.plot(u, f(u, *args))
ax.scatter(roots, f(np.array(roots), *args), color="r", s=10)
ax.grid(color="grey", ls="--", lw=0.5)
plt.show()
Example output:
Roots: [ 2.84599497 8.82720551 12.38857782 15.74736542 19.02545276]
zoom-in:
RootFinder definition
import numpy as np
import scipy
from scipy.optimize import fsolve
from matplotlib import pyplot as plt
class RootFinder:
def __init__(self, start, stop, step=0.01, root_dtype="float64", xtol=1e-9):
self.start = start
self.stop = stop
self.step = step
self.xtol = xtol
self.roots = np.array([], dtype=root_dtype)
def add_to_roots(self, x):
if (x < self.start) or (x > self.stop):
return # outside range
if any(abs(self.roots - x) < self.xtol):
return # root already found.
self.roots = np.append(self.roots, x)
def find(self, f, *args):
current = self.start
for x0 in np.arange(self.start, self.stop + self.step, self.step):
if x0 < current:
continue
x = self.find_root(f, x0, *args)
if x is None: # no root found.
continue
current = x
self.add_to_roots(x)
return self.roots
def find_root(self, f, x0, *args):
x, _, ier, _ = fsolve(f, x0=x0, args=args, full_output=True, xtol=self.xtol)
if ier == 1:
return x[0]
return None
Test function
The scipy.special.jnjn does not exist anymore, but I created similar test function for the case.
def f(u, V=90, ell=5):
w = np.sqrt(V ** 2 - u ** 2)
jl = scipy.special.jn(ell, u)
jl1 = scipy.special.yn(ell - 1, u)
kl = scipy.special.kn(ell, w)
kl1 = scipy.special.kn(ell - 1, w)
return jl / (u * jl1) + kl / (w * kl1)
The main problem I see with this is if you can actually find all roots --- as have already been mentioned in comments, this is not always possible. If you are sure that your function is not completely pathological (sin(1/x) was already mentioned), the next one is what's your tolerance to missing a root or several of them. Put differently, it's about to what length you are prepared to go to make sure you did not miss any --- to the best of my knowledge, there is no general method to isolate all the roots for you, so you'll have to do it yourself. What you show is a reasonable first step already. A couple of comments:
Brent's method is indeed a good choice here.
First of all, deal with the divergencies. Since in your function you have Bessels in the denominators, you can first solve for their roots -- better look them up in e.g., Abramovitch and Stegun (Mathworld link). This will be a better than using an ad hoc grid you're using.
What you can do, once you've found two roots or divergencies, x_1 and x_2, run the search again in the interval [x_1+epsilon, x_2-epsilon]. Continue until no more roots are found (Brent's method is guaranteed to converge to a root, provided there is one).
If you cannot enumerate all the divergencies, you might want to be a little more careful in verifying a candidate is indeed a divergency: given x don't just check that f(x) is large, check that, e.g. |f(x-epsilon/2)| > |f(x-epsilon)| for several values of epsilon (1e-8, 1e-9, 1e-10, something like that).
If you want to make sure you don't have roots which simply touch zero, look for the extrema of the function, and for each extremum, x_e, check the value of f(x_e).
I've also encountered this problem to solve equations like f(z)=0 where f was an holomorphic function. I wanted to be sure not to miss any zero and finally developed an algorithm which is based on the argument principle.
It helps to find the exact number of zeros lying in a complex domain. Once you know the number of zeros, it is easier to find them. There are however two concerns which must be taken into account :
Take care about multiplicity : when solving (z-1)^2 = 0, you'll get two zeros as z=1 is counting twice
If the function is meromorphic (thus contains poles), each pole reduce the number of zero and break the attempt to count them.
I am trying to fit a step function using scipy.optimize.leastsq. Consider the following example:
import numpy as np
from scipy.optimize import leastsq
def fitfunc(p, x):
y = np.zeros(x.shape)
y[x < p[0]] = p[1]
y[p[0] < x] = p[2]
return y
errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target function
x = np.arange(1000)
y = np.random.random(1000)
y[x < 250.] -= 10
p0 = [500.,0.,0.]
p1, success = leastsq(errfunc, p0, args=(x, y))
print p1
the parameters are the location of the step and the level on either side. What's strange is that the first free parameter never varies, if you run that scipy will give
[ 5.00000000e+02 -4.49410173e+00 4.88624449e-01]
when the first parameter would be optimal when set to 250 and the second to -10.
Does anyone have any insight as to why this might not be working and how to get it to work?
If I run
print np.sum(errfunc(p1, x, y)**2.)
print np.sum(errfunc([250.,-10.,0.], x, y)**2.)
I find:
12547.1054663
320.679545235
where the first number is what leastsq is finding, and the second is the value for the actual optimal function it should be finding.
It turns out that the fitting is much better if I add the epsfcn= argument to leastsq:
p1, success = leastsq(errfunc, p0, args=(x, y), epsfcn=10.)
and the result is
[ 248.00000146 -8.8273455 0.40818216]
My basic understanding is that the first free parameter has to be moved more than the spacing between neighboring points to affect the square of the residuals, and epsfcn has something to do with how big steps to use to find the gradient, or something similar.
I don't think that least squares fitting is the way to go about coming up with an approximation for a step. I don't believe it will give you a satisfactory description of the discontinuity. Least squares would not be my first thought when attacking this problem.
Why wouldn't you use a Fourier series approximation instead? You'll always be stuck with Gibbs' phenomenon at the discontinuity, but the rest of the function can be approximated as well as you and your CPU can afford.
What exactly are you going to use this for? Some context might help.
I propose approximating the step function. Instead of
inifinite slope at the "change point" make it linear over
one x distance (1.0 in the example). E.g. if the x
parameter, xp, for the function is defined as the midpoint
on this line then the value at xp-0.5 is the lower y value
and the value at xp+0.5 is the higher y value and
intermediate values of the function in the
interval [xp-0.5; xp+0.5] is a linear
interpolation between these two points.
If it can be assumed that the step function (or its
approximation) goes from a lower value to a higher value
then I think the initial guess for the last two parameters
should be the lowest y value and the highest y value
respectively instead of 0.0 and 0.0.
I have 2 corrections:
1) np.random.random() returns random numbers in the range
0.0 to 1.0. Thus the mean is +0.5 and is also the value of
the third parameter (instead 0.0). And the second paramter
is then -9.5 (+0.5 - 10.0) instead of -10.0.
Thus
print np.sum(errfunc([250.,-10.,0.], x, y)**2.)
should be
print np.sum(errfunc([250.,-9.5,0.5], x, y)**2.)
2) In the original fitfunc() one value of y becomes 0.0 if x
is exactly equal to p[0]. Thus it is not a step function in
that case (more like a sum of two step functions). E.g. this
happens when the start value of the first parameter is 500.
Most probably your optimization is stuck in a local minima. I don't know what leastsq really works like, but if you give it an initial estimate of (0, 0, 0), it gets stuck there, too.
You can check the gradient at the initial estimate numerically (evaluate at +/- epsilon for a very small epsilon and divide bei 2*epsilon, take difference) and I bet it will be sth around 0.
use statsmodel ols. ols uses ordinary least square for curve fitting