Python: Minimization of a function with potentially random outputs

Python: Minimization of a function with potentially random outputs - python

I'm looking to minimize a function with potentially random outputs. Traditionally, I would use something from the scipy.optimize library, but I'm not sure if it'll still work if the outputs are not deterministic.
Here's a minimal example of the problem I'm working with:
def myfunction(self, a):
noise = random.gauss(0, 1)
return abs(a + noise)
Any thoughts on how to algorithmicly minimizes its expected (or average) value?
A numerical approximation would be fine, as long as it can get "relatively" close to the actual value.
We already reduced noise by averaging over many possible runs, but the function is a bit computationally expensive and we don't want to do more averaging if we can help it.

It turns out that for our application using scipy.optimize anneal algorithm provided a good enough estimate of the local maximum.
For more complex problems, pjs points out that Waeber, Frazier and Henderson (2011) link provides a better solution.

Related

SciPy rootfinding algorithm 'gives up' too fast

Is there any way to force 'hybr' method of scipy.optimize 'root' to keep working even after it finds that convergence its too slow? In my problem, the solver nearly reaches desired precision, but right before it, the algorithm terminates because of slow convergence... Is it possible to make 'hybr' more 'self-confident'?
I use the root-finding algorithm root from scipy.optimize module to solve a system of two algebraic, non-linear equations. Since the equations have to be solved many times for various parameter values it is important to find a numerical method that would be most stable for this problem.
I have compared the performance of all the methods provided by scipy.optimize module. To visualize their performance I have used the following procedure:
The algebraic equations were rearranged so that they have zero on the R.H.S.
Then, at each step made by the algorithm, the sum of the L.H.S. squared of all the equations was computed and printed.
In my case, the most efficient method is the default "hybr". Other build-in methods either do not converge at all or are significantly slower. Unfortunately, in some cases the desired method gives up too fast. Lowering the precision and/or providing additional options to the functions did not help.

Numerical integration: Why does my orbit simulation yield the wrong result?

I read Feynman's Lecture on Physics Chapter 9 and tried to my own simulation. I used Riemann integrals to calculate velocity and position. Although all start-entry is same, my orbit look's like a hyperbola.
Here is lecture note: https://www.feynmanlectures.caltech.edu/I_09.html (Table 9.2)
import time
import matplotlib.pyplot as plt
x=list()
y=list()
x_in=0.5
y_in=0.0
x.append(x_in)
y.append(y_in)
class Planet:
def __init__(self,m,rx,ry,vx,vy,G=1):
self.m=m
self.rx=rx
self.ry=ry
self.a=0
self.ax=0
self.ay=0
self.vx=vx
self.vy=vy
self.r=(rx**2+ry**2)**(1/2)
self.f=0
self.G=1
print(self.r)
def calculateOrbit(self,dt,planet):
self.rx=self.rx+self.vx*dt
self.vx=self.vx+self.ax*dt
self.ax=0
for i in planet:
r=((((self.rx-i.rx)**2)+((self.ry-i.ry)**2))**(1/2))
self.ax+=-self.G*i.m*self.rx/(r**3)
self.ry=self.ry+self.vy*dt
self.vy=self.vy+self.ay*dt
self.ay=0
for i in planet:
self.ay+=-self.G*i.m*self.ry/(r**3)
global x,y
x.append(self.rx)
y.append(self.ry)
#self.showOrbit()
def showOrbit(self):
print(""" X: {} Y: {} Ax: {} Ay: {}, Vx: {}, Vy: {}""".format(self.rx,self.ry,self.ax,self.ay,self.vx,self.vy))
planets=list();
earth = Planet(1,x_in,y_in,0,1.630)
sun = Planet(1,0,0,0,0)
planets.append(sun)
for i in range(0,1000):
earth.calculateOrbit(0.1,planets)
plt.plot(x,y)
plt.grid()
plt.xlim(-20.0,20.0)
plt.ylim(-20.0,20.0)
plt.show()

dt is supposed to be infinitly small for the integration to work.
The bigger dt the bigger the "local linearization error" (there is probably a nicer mathematical term for that...).
0.1 for dt may look small enough, but for your result to converge correctly (towards reality) you need to check smaller time steps, too. If smaller time steps converge towards the same solution your equation is linar enough to use a bigger time step (and save comptation time.)
Try your code with
for i in range(0, 10000):
earth.calculateOrbit(0.01, planets)
and
for i in range(0, 100000):
earth.calculateOrbit(0.001, planets)
In both calculations the overall time that has passed since the beginning is the same as with your original values. But the result is quite different. So you might have to use an even smaller dt.
More info:
https://en.wikipedia.org/wiki/Trapezoidal_rule
And this page states what you are doing:
A 'brute force' kind of numerical integration can be done, if the
integrand is reasonably well-behaved (i.e. piecewise continuous and of
bounded variation), by evaluating the integrand with very small
increments.
And what I tried to explain above:
An important part of the analysis of any numerical integration method
is to study the behavior of the approximation error as a function of
the number of integrand evaluations.
There are many smart approaches to make better guesses and use larger time steps. You might have heared of the Runge–Kutta method to solve differential equations. It seems to become Simpson's rule mentioned in the link above for non-differential equations, see here.

The problem is the method of numerical integration or numerical solution for differential equations. The method you're using(Euler's Method for numerical solutions to differential equations), although it gives very close to the actual value but still gives a very small error. When this slightly errored value is used over multiple iterations(like you have done 1000 steps), this error gets larger and larger at every step which gave you the wrong result.
There can be two solution two this problem:
Decrease the time interval to an even smaller value so that even after amplification of errors throughout the process it doesn't get largely deviated from the actual solution. Now the thing to note is that if you decrease the time interval(dt) and not increase the number of steps then you can see the evolution of the system for a shorter period of time. Therefore you'll have to increase the number of steps too along with a decrease in time interval(dt). I checked your code and it seems if you put dt = 0.0001 and put number of steps as 100000 instead of just 1000, you'll get your beautiful elliptical orbit you're looking for. Also, delete or comment out plt.xlim and plt.ylim lines to see your plot clearly.
Implement Runge-Kutta method for the numerical solution of differential equations. This method has better convergence per iteration to the actual solution. Since, it will take much more time and changes to your code, that's why I'm suggesting it as second option otherwise this method is superior and more general than Euler's method.
Note: Even without any changes, solution it is not behaving like a hyperbola. For the initial values that you've provided for the system, solution is giving a bounded curve but just because of the error amplification it is spiraling into a point. You'll notice this spiraling in if you just increase the steps to 10000 and put dt = 0.01.

Integration with Scipy giving incorrect results with negative lower bound

I am attempting to calculate integrals between two limits using python/scipy.
I am using online calculators to double check my results (http://www.wolframalpha.com/widgets/view.jsp?id=8c7e046ce6f4d030f0b386ea5c17b16a, http://www.integral-calculator.com/), and my results disagree when I have certain limits set.
The code used is:
import scipy as sp
import numpy as np
def integrand(x):
return np.exp(-0.5*x**2)
def int_test(a,b):
# a and b are the lower and upper bounds of the integration
return sp.integrate.quad(integrand,a,b)
When setting the limits (a,b) to (-np.inf,1) I get answers that agree (2.10894...)
however if I set (-np.inf,300) I get an answer of zero.
On further investigation using:
for i in range(50):
print(i,int_test(-np.inf,i))
I can see that the result goes wrong at i=36.
I was wondering if there was a way to avoid this?
Thanks,
Matt

I am guessing this has to do with the infinite bounds. scipy.integrate.quad is a wrapper around quadpack routines.
https://people.sc.fsu.edu/~jburkardt/f_src/quadpack/quadpack.html
In the end, these routines chose suitable intervals and try to get the value of the integral through function evaluations and then numerical integrations. This works fine for finite integrals, assuming you know roughly how fine you can make the steps of the function evaluation.
For infinite integrals it depends how well the algorithms choose respective subintervals and how accurately they are computed.
My advice: do NOT use numerical integration software AT ALL if you are interested in accurate values for infinite integrals.
If your problem can be solved analytically, try that or confine yourself to certain bounds.

Inverse Matrix (Numpy) int too large to convert to float

I am trying to take the inverse of a 365x365 matrix. Some of the values get as large as 365**365 and so they are converted to long numbers. I don't know if the linalg.matrix_power() function can handle long numbers. I know the problem comes from this (because of the error message and because my program works just fine for smaller matrices) but I am not sure if there is a way around this. The code needs to work for a NxN matrix.
Here's my code:
item=0
for i in xlist:
xtotal.append(arrayit.arrayit(xlist[item],len(xlist)))
item=item+1
print xtotal
xinverted=numpy.linalg.matrix_power(xtotal,-1)
coeff=numpy.dot(xinverted,ylist)
arrayit.arrayit:
def arrayit(number, length):
newarray=[]
import decimal
i=0
while i!=(length):
newarray.insert(0,decimal.Decimal(number**i))
i=i+1
return newarray;
The program is taking x,y coordinates from a list (list of x's and list of y's) and makes a function.
Thanks!

One thing you might try is the library mpmath, which can do simple matrix algebra and other such problems on arbitrary precision numbers.
A couple of caveats: It will almost certainly be slower than using numpy, and, as Lutzl points out in his answer to this question, the problem may well not be mathematically well defined. Also, you need to decide on the precision you want before you start.
Some brief example code,
from mpmath import mp, matrix
# set the precision - see http://mpmath.org/doc/current/basics.html#setting-the-precision
mp.prec = 5000 # set it to something big at the cost of speed.
# Ideally you'd precalculate what you need.
# a quick trial with 100*100 showed that 5000 works and 500 fails
# see the documentation at http://mpmath.org/doc/current/matrices.html
# where xtotal is the output from arrayit
my_matrix = matrix(xtotal) # I think this should work. If not you'll have to create it and copy
# do the inverse
xinverted = my_matrix**-1
coeff = xinverted*matrix(ylist)
# note that as lutlz pointed out you really want to use solve instead of calculating the inverse.
# I think this is something like
from mpmath import lu_solve
coeff = lu_solve(my_matrix,matrix(ylist))
I suspect your real problem is with the maths rather than the software, so I doubt this will work fantastically well for you, but it's always possible!

Did you ever hear of Lagrange or Newton interpolation? This would avoid the whole construction of the VanderMonde matrix. But not the potentially large numbers in the coefficients.
As a general observation, you do not want the inverse matrix. You do not need to compute it. What you want is to solve a system of linear equations.
x = numpy.linalg.solve(A, b)
solves the system A*x=b.
You (really) might want to look up the Runge effect. Interpolation with equally spaced sample points is an increasingly ill-conditioned task. Useful results can be obtained for single-digit degrees, larger degrees tend to give wildly oscillating polynomials.
You can often use polynomial regression, i.e., approximating your data set by the best polynomial of some low degree.

Parallel many dimensional optimization

I am building a script that generates input data [parameters] for another program to calculate. I would like to optimize the resulting data. Previously I have been using the numpy powell optimization. The psuedo code looks something like this.
def value(param):
run_program(param)
#Parse output
return value
scipy.optimize.fmin_powell(value,param)
This works great; however, it is incredibly slow as each iteration of the program can take days to run. What I would like to do is coarse grain parallelize this. So instead of running a single iteration at a time it would run (number of parameters)*2 at a time. For example:
Initial guess: param=[1,2,3,4,5]
#Modify guess by plus minus another matrix that is changeable at each iteration
jump=[1,1,1,1,1]
#Modify each variable plus/minus jump.
for num,a in enumerate(param):
new_param1=param[:]
new_param1[num]=new_param1[num]+jump[num]
run_program(new_param1)
new_param2=param[:]
new_param2[num]=new_param2[num]-jump[num]
run_program(new_param2)
#Wait until all programs are complete -> Parse Output
Output=[[value,param],...]
#Create new guess
#Repeat
Number of variable can range from 3-12 so something such as this could potentially speed up the code from taking a year down to a week. All variables are dependent on each other and I am only looking for local minima from the initial guess. I have started an implementation using hessian matrices; however, that is quite involved. Is there anything out there that either does this, is there a simpler way, or any suggestions to get started?
So the primary question is the following:
Is there an algorithm that takes a starting guess, generates multiple guesses, then uses those multiple guesses to create a new guess, and repeats until a threshold is found. Only analytic derivatives are available. What is a good way of going about this, is there something built already that does this, is there other options?
Thank you for your time.
As a small update I do have this working by calculating simple parabolas through the three points of each dimension and then using the minima as the next guess. This seems to work decently, but is not optimal. I am still looking for additional options.
Current best implementation is parallelizing the inner loop of powell's method.
Thank you everyone for your comments. Unfortunately it looks like there is simply not a concise answer to this particular problem. If I get around to implementing something that does this I will paste it here; however, as the project is not particularly important or the need of results pressing I will likely be content letting it take up a node for awhile.

I had the same problem while I was in the university, we had a fortran algorithm to calculate the efficiency of an engine based on a group of variables. At the time we use modeFRONTIER and if I recall correctly, none of the algorithms were able to generate multiple guesses.
The normal approach would be to have a DOE and there where some algorithms to generate the DOE to best fit your problem. After that we would run the single DOE entries parallely and an algorithm would "watch" the development of the optimizations showing the current best design.
Side note: If you don't have a cluster and needs more computing power HTCondor may help you.

Are derivatives of your goal function available? If yes, you can use gradient descent (old, slow but reliable) or conjugate gradient. If not, you can approximate the derivatives using finite differences and still use these methods. I think in general, if using finite difference approximations to the derivatives, you are much better off using conjugate gradients rather than Newton's method.
A more modern method is SPSA which is a stochastic method and doesn't require derivatives. SPSA requires much fewer evaluations of the goal function for the same rate of convergence than the finite difference approximation to conjugate gradients, for somewhat well-behaved problems.

There are two ways of estimating gradients, one easily parallelizable, one not:
around a single point, e.g. (f( x + h directioni ) - f(x)) / h;
this is easily parallelizable up to Ndim
"walking" gradient: walk from x0 in direction e0 to x1,
then from x1 in direction e1 to x2 ...;
this is sequential.
Minimizers that use gradients are highly developed, powerful, converge quadratically (on smooth enough functions).
The user-supplied gradient function
can of course be a parallel-gradient-estimator.
A few minimizers use "walking" gradients, among them Powell's method,
see Numerical Recipes p. 509.
So I'm confused: how do you parallelize its inner loop ?
I'd suggest scipy fmin_tnc
with a parallel-gradient-estimator, maybe using central, not one-sided, differences.
(Fwiw,
this
compares some of the scipy no-derivative optimizers on two 10-d functions; ymmv.)

I think what you want to do is use the threading capabilities built-in python.
Provided you your working function has more or less the same run-time whatever the params, it would be efficient.
Create 8 threads in a pool, run 8 instances of your function, get 8 result, run your optimisation algo to change the params with 8 results, repeat.... profit ?

If I haven't gotten wrong what you are asking, you are trying to minimize your function one parameter at the time.
you can obtain it by creating a set of function of a single argument, where for each function you freeze all the arguments except one.
Then you go on a loop optimizing each variable and updating the partial solution.
This method can speed up by a great deal function of many parameters where the energy landscape is not too complex (the dependency between the parameters is not too strong).
given a function
energy(*args) -> value
you create the guess and the function:
guess = [1,1,1,1]
funcs = [ lambda x,i=i: energy( guess[:i]+[x]+guess[i+1:] ) for i in range(len(guess)) ]
than you put them in a while cycle for the optimization
while convergence_condition:
for func in funcs:
optimize fot func
update the guess
check for convergence
This is a very simple yet effective method of simplify your minimization task. I can't really recall how this method is called, but A close look to the wikipedia entry on minimization should do the trick.

You could do parallel at two parts: 1) parallel the calculation of single iteration or 2) parallel start N initial guessing.
On 2) you need a job controller to control the N initial guess discovery threads.
Please add an extra output on your program: "lower bound" that indicates the output values of current input parameter's decents wont lower than this lower bound.
The initial N guessing thread can compete with each other; if any one thread's lower bound is higher than existing thread's current value, then this thread can be dropped by your job controller.

Parallelizing local optimizers is intrinsically limited: they start from a single initial point and try to work downhill, so later points depend on the values of previous evaluations. Nevertheless there are some avenues where a modest amount of parallelization can be added.
As another answer points out, if you need to evaluate your derivative using a finite-difference method, preferably with an adaptive step size, this may require many function evaluations, but the derivative with respect to each variable may be independent; you could maybe get a speedup by a factor of twice the number of dimensions of your problem. If you've got more processors than you know what to do with, you can use higher-order-accurate gradient formulae that require more (parallel) evaluations.
Some algorithms, at certain stages, use finite differences to estimate the Hessian matrix; this requires about half the square of the number of dimensions of your matrix, and all can be done in parallel.
Some algorithms may also be able to use more parallelism at a modest algorithmic cost. For example, quasi-Newton methods try to build an approximation of the Hessian matrix, often updating this by evaluating a gradient. They then take a step towards the minimum and evaluate a new gradient to update the Hessian. If you've got enough processors so that evaluating a Hessian is as fast as evaluating the function once, you could probably improve these by evaluating the Hessian at every step.
As far as implementations go, I'm afraid you're somewhat out of luck. There are a number of clever and/or well-tested implementations out there, but they're all, as far as I know, single-threaded. Your best bet is to use an algorithm that requires a gradient and compute your own in parallel. It's not that hard to write an adaptive one that runs in parallel and chooses sensible step sizes for its numerical derivatives.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.