Numerical integration: Why does my orbit simulation yield the wrong result? - python

I read Feynman's Lecture on Physics Chapter 9 and tried to my own simulation. I used Riemann integrals to calculate velocity and position. Although all start-entry is same, my orbit look's like a hyperbola.
Here is lecture note: https://www.feynmanlectures.caltech.edu/I_09.html (Table 9.2)
import time
import matplotlib.pyplot as plt
x=list()
y=list()
x_in=0.5
y_in=0.0
x.append(x_in)
y.append(y_in)
class Planet:
def __init__(self,m,rx,ry,vx,vy,G=1):
self.m=m
self.rx=rx
self.ry=ry
self.a=0
self.ax=0
self.ay=0
self.vx=vx
self.vy=vy
self.r=(rx**2+ry**2)**(1/2)
self.f=0
self.G=1
print(self.r)
def calculateOrbit(self,dt,planet):
self.rx=self.rx+self.vx*dt
self.vx=self.vx+self.ax*dt
self.ax=0
for i in planet:
r=((((self.rx-i.rx)**2)+((self.ry-i.ry)**2))**(1/2))
self.ax+=-self.G*i.m*self.rx/(r**3)
self.ry=self.ry+self.vy*dt
self.vy=self.vy+self.ay*dt
self.ay=0
for i in planet:
self.ay+=-self.G*i.m*self.ry/(r**3)
global x,y
x.append(self.rx)
y.append(self.ry)
#self.showOrbit()
def showOrbit(self):
print(""" X: {} Y: {} Ax: {} Ay: {}, Vx: {}, Vy: {}""".format(self.rx,self.ry,self.ax,self.ay,self.vx,self.vy))
planets=list();
earth = Planet(1,x_in,y_in,0,1.630)
sun = Planet(1,0,0,0,0)
planets.append(sun)
for i in range(0,1000):
earth.calculateOrbit(0.1,planets)
plt.plot(x,y)
plt.grid()
plt.xlim(-20.0,20.0)
plt.ylim(-20.0,20.0)
plt.show()

dt is supposed to be infinitly small for the integration to work.
The bigger dt the bigger the "local linearization error" (there is probably a nicer mathematical term for that...).
0.1 for dt may look small enough, but for your result to converge correctly (towards reality) you need to check smaller time steps, too. If smaller time steps converge towards the same solution your equation is linar enough to use a bigger time step (and save comptation time.)
Try your code with
for i in range(0, 10000):
earth.calculateOrbit(0.01, planets)
and
for i in range(0, 100000):
earth.calculateOrbit(0.001, planets)
In both calculations the overall time that has passed since the beginning is the same as with your original values. But the result is quite different. So you might have to use an even smaller dt.
More info:
https://en.wikipedia.org/wiki/Trapezoidal_rule
And this page states what you are doing:
A 'brute force' kind of numerical integration can be done, if the
integrand is reasonably well-behaved (i.e. piecewise continuous and of
bounded variation), by evaluating the integrand with very small
increments.
And what I tried to explain above:
An important part of the analysis of any numerical integration method
is to study the behavior of the approximation error as a function of
the number of integrand evaluations.
There are many smart approaches to make better guesses and use larger time steps. You might have heared of the Runge–Kutta method to solve differential equations. It seems to become Simpson's rule mentioned in the link above for non-differential equations, see here.

The problem is the method of numerical integration or numerical solution for differential equations. The method you're using(Euler's Method for numerical solutions to differential equations), although it gives very close to the actual value but still gives a very small error. When this slightly errored value is used over multiple iterations(like you have done 1000 steps), this error gets larger and larger at every step which gave you the wrong result.
There can be two solution two this problem:
Decrease the time interval to an even smaller value so that even after amplification of errors throughout the process it doesn't get largely deviated from the actual solution. Now the thing to note is that if you decrease the time interval(dt) and not increase the number of steps then you can see the evolution of the system for a shorter period of time. Therefore you'll have to increase the number of steps too along with a decrease in time interval(dt). I checked your code and it seems if you put dt = 0.0001 and put number of steps as 100000 instead of just 1000, you'll get your beautiful elliptical orbit you're looking for. Also, delete or comment out plt.xlim and plt.ylim lines to see your plot clearly.
Implement Runge-Kutta method for the numerical solution of differential equations. This method has better convergence per iteration to the actual solution. Since, it will take much more time and changes to your code, that's why I'm suggesting it as second option otherwise this method is superior and more general than Euler's method.
Note: Even without any changes, solution it is not behaving like a hyperbola. For the initial values that you've provided for the system, solution is giving a bounded curve but just because of the error amplification it is spiraling into a point. You'll notice this spiraling in if you just increase the steps to 10000 and put dt = 0.01.

Related

Optimization method selection & dealing with convergence and variability

The Problem
I am looking to tackle a minimization problem using scipy's optimization utilities.
Specifically, I've been using this function:
result = spo.minimize(s21_mag, goto_start, options={"disp": True}, bounds=bnds)
My s21_mag function takes a couple of seconds to return an output (due to physically moving motors). It consists of 3 parameters (3 moving parts), with no constraints - just three bounds (identical for all 3 parameters):
bnds = ((0,45000),(0,45000),(0,45000))
The limit on the amount of iterations is not very constraint (1000 is probably a good enough upper limit for me), but I expect the optimizer to try many configurations in this set of iterations to identify an optimal value. So far, some methods I've tried just seem to converge somewhere with meaningless progress.
Here's progress beyond the 50th iteration (full code here) - the goal is the maximization of S21 at a specific frequency (purple vertical line):
This is with no method passed tospo.minimize(), so it uses the default (and it looks like it applies the exact same movement to each motor).
Questions
Although scipy's minimization function offers a wide variety of optimization methods/algorithms, how could I (as a beginner in optimization math) select the one that would work best for my application? What kind of aspects of my problem should I take into account to jump to such conclusions? Assume I have no idea about the initial value of each parameter and want the optimizer to figure that out (I usually just set it to the midpoint, i.e. initial: x1=x2=x3=22500).
The same set of parameters as an input to my s21_mag function could yield different results at different times the function is called.
This happens for two reasons:
(a) The parameter step of the optimizer can get extremely small (particularly as the number of iterations increase and the convergence is approached), whereas the motor expects a minimum value of ~100 to make a step.
Is there a way to somehow set a minimum step? Otherwise, it tries to step from e.g. 1234.0 to 1234.0001 and eventually gets "stuck" between trying tiny changes.
(b) The output of the function goes through a measuring instrument, which exhibits a little bit of noise (e.g. one measurement may yield 5.42 dB, while another measurement (with the exact same parameters) may yield 5.43 dB).
Is there a way to deal with these kinds of small variabilities/errors to avoid confusions for the optimizer?

Algorithm to smooth out noise in a running system while converging on an initially unknown constant

I'm trying to smooth out noise in a slightly unusual situation. There's probably a common algorithm to solve my problem, but I don't know what it is.
I'm currently building a robotic telescope mount. To track the movement of the sky, the mount takes a photo of the sky once per second and tracks changes in the X, Y, and rotation of the stars it can see.
If I just use the raw measurements to track rotation, the output is choppy and noisy, like this:
Guiding with raw rotation measurements:
If I use a lowpass filter, the mount overshoots and never completely settles down. A lower Beta value helps with this, but then the corrections are too slow and error accumulates.
Guiding with lowpass filter:
(In both graphs, purple is the difference between sky and mount rotation, red is the corrective rotations made by the mount.)
A moving average had the same problems as the lowpass filter.
More information about the problem:
For a given area of the sky, the rotation of the stars will be constant. However, we don't know where we are and the measurement of sky rotation is very noisy due to atmospheric jitter, so the algorithm has to work its way towards this initially unknown constant value while guiding.
The mount can move as far as necessary in one second, and has its own control system. So I don't think this is a PID loop control system problem.
It's OK to guide badly (or not at all) for the first 30 seconds or so.
I wrote a small Python program to simulate the problem - might as well include it here, I suppose. This one is currently using a lowpass filter.
#!/usr/bin/env python3
import random
import matplotlib.pyplot as plt
ROTATION_CONSTANT = 0.1
TIME_WINDOW = 300
skyRotation = 0
mountRotation = 0
error = 0
errorList = []
rotationList = []
measurementList = []
smoothData = 0
LPF_Beta = 0.08
for step in range(TIME_WINDOW):
skyRotation += ROTATION_CONSTANT
randomNoise = random.random() - random.random()
rotationMeasurement = skyRotation - mountRotation + randomNoise
# Lowpass filter
smoothData = smoothData - (LPF_Beta * (smoothData - rotationMeasurement));
mountRotation += smoothData
rotationList.append(smoothData)
errorList.append(skyRotation - mountRotation)
measurementList.append(rotationMeasurement)
plt.plot([0, TIME_WINDOW], [ROTATION_CONSTANT, ROTATION_CONSTANT], color='black', linestyle='-', linewidth=2)
plt.plot(errorList, color="purple")
plt.plot(rotationList, color="red")
plt.plot(measurementList, color="blue", alpha=0.2)
plt.axis([0, TIME_WINDOW, -1.5, 1.5])
plt.xlabel("Time (seconds)")
plt.ylabel("Rotation (degrees)")
plt.show()
If anyone knows how to make this converge smoothly (or could recommend relevant learning resources), I would be most grateful. I'd be happy to read up on the topic but so far haven't figured out what to look for!
I would first of all try and do this the easy way by making your control outputs the result of a PID and then tuning the PID as described at e.g. https://robotics.stackexchange.com/questions/167/what-are-good-strategies-for-tuning-pid-loops or from your favourite web search.
Most other approaches require you to have an accurate model of the situation, including the response of the hardware under control to your control inputs, so your next step might be experiments to try and work this out, e.g. by attempting to work out the response to simple test inputs, such as an impulse or a step. Once you have a simulator you can, at the very least, tune parameters for proposed approaches more quickly and safely on the simulator than on the real hardware.
If your simulator is accurate, and if you are seeing more problems in the first 30 seconds than afterwards, I suggest using a Kalman filter to estimate the current error, and then sending in the control that (according to the model that you have constructed) will minimise the mean squared error between the time the control is acted upon and the time of the next observation. Using a Kalman filter will at least take account of the increased observational error when the system starts up.
Warning: the above use of the Kalman filter is myopic, and will fail dramatically in some situations where there is something corresponding to momentum: it will over-correct and end up swinging wildly from one extreme to another. Better use of the Kalman filter results would be to compute a number of control inputs, minimizing the predicted error at the end of this sequence of inputs (e.g. with dynamic programming) and then revisit the problem after the first control input has been executed. In the simple example where I found over-correction you can get stable behavior if you calculate the single control action that minimizes the error if sustained for two time periods, but revisit the problem and recalculate the control action at the end of one time period. YMMV.
If that doesn't work, perhaps it is time to take your accurate simulation, linearize it to get differential equations, and apply classical control theory. If it won't linearize reasonably over the full range, you could try breaking that range down, perhaps using different strategies for large and small errors.
Such (little) experience as I have from control loops suggests that it is extremely important to minimize the delay and jitter in the loop between the sensor sensing and the control actuating. If there is any unnecessary source of jitter or delay between input and control forget the control theory while you get that fixed.

Python: Minimization of a function with potentially random outputs

I'm looking to minimize a function with potentially random outputs. Traditionally, I would use something from the scipy.optimize library, but I'm not sure if it'll still work if the outputs are not deterministic.
Here's a minimal example of the problem I'm working with:
def myfunction(self, a):
noise = random.gauss(0, 1)
return abs(a + noise)
Any thoughts on how to algorithmicly minimizes its expected (or average) value?
A numerical approximation would be fine, as long as it can get "relatively" close to the actual value.
We already reduced noise by averaging over many possible runs, but the function is a bit computationally expensive and we don't want to do more averaging if we can help it.
It turns out that for our application using scipy.optimize anneal algorithm provided a good enough estimate of the local maximum.
For more complex problems, pjs points out that Waeber, Frazier and Henderson (2011) link provides a better solution.

Interpolaton algorithm to correct a slight clock drift

I have some sampled (univariate) data - but the clock driving the sampling process is inaccurate - resulting in a random slip of (less than) 1 sample every 30. A more accurate clock at approximately 1/30 of the frequency provides reliable samples for the same data ... allowing me to establish a good estimate of the clock drift.
I am looking to interpolate the sampled data to correct for this so that I 'fit' the high frequency data to the low-frequency. I need to do this 'real time' - with no more than the latency of a few low-frequency samples.
I recognise that there is a wide range of interpolation algorithms - and, among those I've considered, a spline based approach looks most promising for this data.
I'm working in Python - and have found the scipy.interpolate package - though I could see no obvious way to use it to 'stretch' n samples to correct a small timing error. Am I overlooking something?
I am interested in pointers to either a suitable published algorithm, or - ideally - a Python library function to achieve this sort of transform. Is this supported by SciPy (or anything else)?
UPDATE...
I'm beginning to realise that what, at first, seemed a trivial problem isn't as straightforward as I first thought. I am no-longer convinced that naive use of splines will suffice. I've also realised that my problem can be better described without reference to 'clock drift'... like this:
A single random variable is sampled at two different frequencies - one low and one high, with no common divisor - e.g. 5hz and 144hz. If we assume sample 0 is identical at both sample rates, sample 1 #5hz falls between samples 28 amd 29. I want to construct a new series - at 720hz, say - that fits all the known data points "as smoothly as possible".
I had hoped to find an 'out of the box' solution.
Before you can ask the programming question, it seems to me you need to investigate a more fundamental scientific one.
Before you can start picking out particular equations to fit badfastclock to goodslowclock, you should investigate the nature of the drift. Let both clocks run a while, and look at their points together. Is badfastclock bad because it drifts linearly away from real time? If so, a simple quadratic equation should fit badfastclock to goodslowclock, just as a quadratic equation describes the linear acceleration of a object in gravity; i.e., if badfastclock is accelerating linearly away from real time, you can deterministically shift badfastclock toward real time. However, if you find that badfastclock is bad because it is jumping around, then smooth curves -- even complex smooth curves like splines -- won't fit. You must understand the data before trying to manipulate it.
Bsed on your updated question, if the data is smooth with time, just place all the samples in a time trace, and interpolate on the sparse grid (time).

Parallel many dimensional optimization

I am building a script that generates input data [parameters] for another program to calculate. I would like to optimize the resulting data. Previously I have been using the numpy powell optimization. The psuedo code looks something like this.
def value(param):
run_program(param)
#Parse output
return value
scipy.optimize.fmin_powell(value,param)
This works great; however, it is incredibly slow as each iteration of the program can take days to run. What I would like to do is coarse grain parallelize this. So instead of running a single iteration at a time it would run (number of parameters)*2 at a time. For example:
Initial guess: param=[1,2,3,4,5]
#Modify guess by plus minus another matrix that is changeable at each iteration
jump=[1,1,1,1,1]
#Modify each variable plus/minus jump.
for num,a in enumerate(param):
new_param1=param[:]
new_param1[num]=new_param1[num]+jump[num]
run_program(new_param1)
new_param2=param[:]
new_param2[num]=new_param2[num]-jump[num]
run_program(new_param2)
#Wait until all programs are complete -> Parse Output
Output=[[value,param],...]
#Create new guess
#Repeat
Number of variable can range from 3-12 so something such as this could potentially speed up the code from taking a year down to a week. All variables are dependent on each other and I am only looking for local minima from the initial guess. I have started an implementation using hessian matrices; however, that is quite involved. Is there anything out there that either does this, is there a simpler way, or any suggestions to get started?
So the primary question is the following:
Is there an algorithm that takes a starting guess, generates multiple guesses, then uses those multiple guesses to create a new guess, and repeats until a threshold is found. Only analytic derivatives are available. What is a good way of going about this, is there something built already that does this, is there other options?
Thank you for your time.
As a small update I do have this working by calculating simple parabolas through the three points of each dimension and then using the minima as the next guess. This seems to work decently, but is not optimal. I am still looking for additional options.
Current best implementation is parallelizing the inner loop of powell's method.
Thank you everyone for your comments. Unfortunately it looks like there is simply not a concise answer to this particular problem. If I get around to implementing something that does this I will paste it here; however, as the project is not particularly important or the need of results pressing I will likely be content letting it take up a node for awhile.
I had the same problem while I was in the university, we had a fortran algorithm to calculate the efficiency of an engine based on a group of variables. At the time we use modeFRONTIER and if I recall correctly, none of the algorithms were able to generate multiple guesses.
The normal approach would be to have a DOE and there where some algorithms to generate the DOE to best fit your problem. After that we would run the single DOE entries parallely and an algorithm would "watch" the development of the optimizations showing the current best design.
Side note: If you don't have a cluster and needs more computing power HTCondor may help you.
Are derivatives of your goal function available? If yes, you can use gradient descent (old, slow but reliable) or conjugate gradient. If not, you can approximate the derivatives using finite differences and still use these methods. I think in general, if using finite difference approximations to the derivatives, you are much better off using conjugate gradients rather than Newton's method.
A more modern method is SPSA which is a stochastic method and doesn't require derivatives. SPSA requires much fewer evaluations of the goal function for the same rate of convergence than the finite difference approximation to conjugate gradients, for somewhat well-behaved problems.
There are two ways of estimating gradients, one easily parallelizable, one not:
around a single point, e.g. (f( x + h directioni ) - f(x)) / h;
this is easily parallelizable up to Ndim
"walking" gradient: walk from x0 in direction e0 to x1,
then from x1 in direction e1 to x2 ...;
this is sequential.
Minimizers that use gradients are highly developed, powerful, converge quadratically (on smooth enough functions).
The user-supplied gradient function
can of course be a parallel-gradient-estimator.
A few minimizers use "walking" gradients, among them Powell's method,
see Numerical Recipes p. 509.
So I'm confused: how do you parallelize its inner loop ?
I'd suggest scipy fmin_tnc
with a parallel-gradient-estimator, maybe using central, not one-sided, differences.
(Fwiw,
this
compares some of the scipy no-derivative optimizers on two 10-d functions; ymmv.)
I think what you want to do is use the threading capabilities built-in python.
Provided you your working function has more or less the same run-time whatever the params, it would be efficient.
Create 8 threads in a pool, run 8 instances of your function, get 8 result, run your optimisation algo to change the params with 8 results, repeat.... profit ?
If I haven't gotten wrong what you are asking, you are trying to minimize your function one parameter at the time.
you can obtain it by creating a set of function of a single argument, where for each function you freeze all the arguments except one.
Then you go on a loop optimizing each variable and updating the partial solution.
This method can speed up by a great deal function of many parameters where the energy landscape is not too complex (the dependency between the parameters is not too strong).
given a function
energy(*args) -> value
you create the guess and the function:
guess = [1,1,1,1]
funcs = [ lambda x,i=i: energy( guess[:i]+[x]+guess[i+1:] ) for i in range(len(guess)) ]
than you put them in a while cycle for the optimization
while convergence_condition:
for func in funcs:
optimize fot func
update the guess
check for convergence
This is a very simple yet effective method of simplify your minimization task. I can't really recall how this method is called, but A close look to the wikipedia entry on minimization should do the trick.
You could do parallel at two parts: 1) parallel the calculation of single iteration or 2) parallel start N initial guessing.
On 2) you need a job controller to control the N initial guess discovery threads.
Please add an extra output on your program: "lower bound" that indicates the output values of current input parameter's decents wont lower than this lower bound.
The initial N guessing thread can compete with each other; if any one thread's lower bound is higher than existing thread's current value, then this thread can be dropped by your job controller.
Parallelizing local optimizers is intrinsically limited: they start from a single initial point and try to work downhill, so later points depend on the values of previous evaluations. Nevertheless there are some avenues where a modest amount of parallelization can be added.
As another answer points out, if you need to evaluate your derivative using a finite-difference method, preferably with an adaptive step size, this may require many function evaluations, but the derivative with respect to each variable may be independent; you could maybe get a speedup by a factor of twice the number of dimensions of your problem. If you've got more processors than you know what to do with, you can use higher-order-accurate gradient formulae that require more (parallel) evaluations.
Some algorithms, at certain stages, use finite differences to estimate the Hessian matrix; this requires about half the square of the number of dimensions of your matrix, and all can be done in parallel.
Some algorithms may also be able to use more parallelism at a modest algorithmic cost. For example, quasi-Newton methods try to build an approximation of the Hessian matrix, often updating this by evaluating a gradient. They then take a step towards the minimum and evaluate a new gradient to update the Hessian. If you've got enough processors so that evaluating a Hessian is as fast as evaluating the function once, you could probably improve these by evaluating the Hessian at every step.
As far as implementations go, I'm afraid you're somewhat out of luck. There are a number of clever and/or well-tested implementations out there, but they're all, as far as I know, single-threaded. Your best bet is to use an algorithm that requires a gradient and compute your own in parallel. It's not that hard to write an adaptive one that runs in parallel and chooses sensible step sizes for its numerical derivatives.

Categories

Resources