Genetic Algorithm in Optimization of Events - python

I'm a data analysis student and I'm starting to explore Genetic Algorithms at the moment. I'm trying to solve a problem with GA but I'm not sure about the formulation of the problem.
Basically I have a state of a variable being 0 or 1 (0 it's in the normal range of values, 1 is in a critical state). When the state is 1 I can apply 3 solutions (let's consider Solution A, B and C) and for each solution I know the time that the solution was applied and the time where the state of the variable goes to 0.
So I have for the problem a set of data that have a critical event at 1, the solution applied and the time interval (in minutes) from the critical event to the application of the solution, and the time interval (in minutes) from the application of the solution until the event goes to 0.
I want with a genetic algorithm to know which is the best solution for a critical event and the fastest one. And if it is possible to rank the solutions acquired so if in the future on solution can't be applied I can always apply the second best for example.
I'm thinking of developing the solution in Python since I'm new to GA.
Edit: Specifying the problem (responding to AMack)
Yes is more a less that but with some nuances. For example the function A can be more suitable to make the variable go to F but because exist other problems with the variable are applied more than one solution. So on the data that i receive for an event of V, sometimes can be applied 3 ou 4 functions but only 1 or 2 of them are specialized for the problem that i want to analyze. My objetive is to make a decision support on the solution to use when determined problem appear. But the optimal solution can be more that one because for some event function A acts very fast but in other case of the same event function A don't produce a fast response and function C is better in that case. So in the end i pretend a solution where is indicated what are the best solutions to the problem but not only the fastest because the fastest in the majority of the cases sometimes is not the fastest in the same issue but with a different background.

I'm unsure of what your question is, but here are the elements you need for any GA:
A population of initial "genomes"
A ranking function
Some form of mutation, crossing over within the genome
and reproduction.
If a critical event is always the same, your GA should work very well. That being said, if you have a different critical event but the same genome you will run into trouble. GA's evolve functions towards the best possible solution for A Set of conditions. If you constantly run the GA so that it may adapt to each unique situation you will find a greater degree of adaptability, but have a speed issue.
You have a distinct advantage using python because string manipulation (what you'll probably use for the genome) is easy, however...
python is slow.
If the genome is short, the initial population is small, and there are very few generations this shouldn't be a problem. You lose possibly better solutions that way but it will be significantly faster.
have fun...

You should take a look at the GARAGe Michigan State. They are a GA research group with a fair number of resources in terms of theory, papers, and software that should provide inspiration.

To start, let's make sure I understand your problem.
You have a set of sample data, each element containing a time series of a binary variable (we'll call it V). When V is set to True, a function (A, B, or C) is applied which returns V to it's False state. You would like to apply a genetic algorithm to determine which function (or solution) will return V to False in the least amount of time.
If this is the case, I would stay away from GAs. GAs are typically used for some kind of function optimization / tuning. In general, the underlying assumption is that what you permute is under your control during the algorithm's application (i.e., you are modifying parameters used by the algorithm that are independent of the input data). In your case, my impression is that you just want to find out which of your (I assume) static functions perform best in a wide variety of cases. If you don't feel your current dataset provides a decent approximation of your true input distribution, you can always sample from it and permute the values to see what happens; however, this would not be a GA.
Having said all of this, I could be wrong. If anyone has used GAs in verification like this, please let me know. I'd certainly be interested in learning about it.

Related

Slight difference in objective function of linear programming makes program extremely slow

I am using Google's OR Tool SCIP (Solving Constraint Integer Programs) solver to solve a Mixed integer programming problem using Python. The problem is a variant of the standard scheduling problem, where there are constraints limiting that each worker works maximum once per day and that every shift is covered by only one worker. The problem is modeled as follows:
Where n represents the worker, d the day and i the specific shift in a given day.
The problem comes when I change the objective function that I want to minimize from
To:
In the first case an optimal solution is found within 5 seconds. In the second case, after 20 minutes running, the optimal solution was still not reached. Any ideas to why this happens?
How can I change the objective function without impacting performance this much?
Here is a sample of the values taken by the variables tier and acceptance used in the objective function.
You should ask the SCIP team.
Have you tried using the SAT backend with 8 threads ?
The only thing that I can spot from reading your post is that the objective function is no longer pure integer after adding the acceptance. If you know that your objective is always integer that helps during the solve since you can also round up all your dual bounds. This might be critical for your problem class.
Maybe you could also post a SCIP log (preferable with statistics) of the two runs?

Simultaneous recursion on equivalent graphs using Python

I have a structure, looking a lot like a graph but I can 'sort' it. Therefore I can have two graphs, that are equivalent, but one is sorted and not the other. My goal is to compute a minimal dominant set (with a custom algorithm that fits my specific problem, so please do not link to other 'efficient' algorithms).
The thing is, I search for dominant sets of size one, then two, etc until I find one. If there isn't a dominant set of size i, using the sorted graph is a lot more efficient. If there is one, using the unsorted graph is much better.
I thought about using threads/multiprocessing, so that both graphs are explored at the same time and once one finds an answer (no solution or a specific solution), the other one stops and we go to the next step or end the algorithm. This didn't work, it just makes the process much slower (even though I would expect it to just double the time required for each step, compared to using the optimal graph without threads/multiprocessing).
I don't know why this didn't work and wonder if there is a better way, that maybe doesn't even required the use of threads/multiprocessing, any clue?
If you don't want an algorithm suggestion, then lazy evaluation seems like the way to go.
Setup the two in a data structure such that with a class_instance.next_step(work_to_do_this_step) where a class instance is a solver for one graph type. You'll need two of them. You can have each graph move one "step" (whatever you define a step to be) forward. By careful selection (possibly dynamically based on how things are going) of what a step is, you can efficiently alternate between how much work/time is being spent on the sorted vs unsorted graph approaches. Of course this is only useful if there is at least a chance that either algorithm may finish before the other.
In theory if you can independently define what those steps are, then you could split up the work to run them in parallel, but it's important that each process/thread is doing roughly the same amount of "work" so they all finish about the same time. Though writing parallel algorithms for these kinds of things can be a bit tricky.
Sounds like you're not doing what you describe. Possibly you're waiting for BOTH to finish somehow? Try doing that, and seeing if the time changes.

Method for finding best result from (almost) random data

So I'm working on some calculator for a game I play - for fun, which takes various abilities with different cooldowns, usage times, a percentage in which they may be used at etc ...
So far I am doing this by analyzing numbers in base however many abilities I have, so for example assuming i have 5 abilities used over 4 seconds:
0000: 60 damage (using ability 0, trying to use it again but failing - so returns ability damage of 0)
0001: 60 damage
Skip a few ...
0101: 200 damage
and again ...
4444: 70 damage.
Process terminates. - Hope that made sense.
Problem is, doing this in brute force works well with small times (like above) and number or abilities, however at much higher times and number of abilities it runs analyzing trillions of simulations which means brute force no longer becomes an option.
Question is, considering the data is mostly random, are there any heuristic algorithm's that (all thought may not return the optimal) will return a relatively good result.
Thanks for any responses :)
Let me rephrase to make sure I understand correctly: you want to find the best sequencing of skills, given their individual damage and cooldowns, such that only one skill is used at each time, and no skill is used more often than its cooldown allows. If so, it is a kind of a scheduling problem and one way to approach would be through linear programming.
The rough idea is to introduce n_skills * simulation_length variables x[skill][time], each constrained between 0 and 1, with the interpretation of "use skill skill at time time if x[skill][time] == 1, don't use if == 0". Now you optimize the sum of all variables weighted by the damage their skill does, sum(x[skill][:] * damage[skill] for skill in skills), under additional linear constraints (explained through numpy-like pseudocode):
for each time t, sum(x[:][t]) <= 1 (at each time you can use at most one ability)
for each ability a and time t0 sum(x[a][t0-cooldown(a):t0+cooldown(a)] <= 1 (within the period of its cooldown, you can only use your ability at most once)
Now the tricky part is that while this will give you a solution that is optimal in some sense, it will most likely not be physical, that is you'll get fractional xs. This is where the heuristic part kicks in, you have to find some way to "round" the solution to integers, losing objective value in the process, to make it physically (game-ally) meaningful. One way is to only keep x[a][t] == 1, and round all other numbers down to zero. It will give a meaningful solution, but it may not be very satisfying (ie. your character will do almost nothing). Given that my model for the problem is quite simple, I would expect there are some theoretical results on how to give a good rounding.
While I can suggest the scipy package for solving the linear program once it's formulated, the whole problem of building the constraint matrix and rounding the results (even trivially) is not a beginner-level programming task.

Function to determine a reasonable initial guess for scipy.optimize?

I'm using scipy.optimize.minimize to find the minimum of a 4D function that is rather sensitive to the initial guess used. If I vary it a little bit, the solution will change considerably.
There are many questions similar to this one already in SO (e.g.: 1, 2, 3), but no real answer.
In an old question of mine, one of the developers of the zunzun.com site (apparently no longer online) explained how they managed this:
Zunzun.com uses the Differential Evolution genetic algorithm (DE) to find initial parameter estimates which are then passed to the Levenberg-Marquardt solver in scipy. DE is not actually used as a global optimizer per se, but rather as an "initial parameter guesser".
The closest I've found to this algorithm is this answer where a for block is used to call the minimizing function many times with random initial guesses. This generates multiple minimized solutions, and finally the best (smallest value) one is picked.
Is there something like what the zunzun dev described already implemented in Python?
There is no general answer for such question, as a problem of minimizing arbitrary function is impossible to solve. You can do better or worse on particular classes of functions, thus it is rather a domain for mathematician, to analyze how your function probably looks like.
Obviously you can also work with dozens of so called "meta optimizers", which are just bunch of heuristics, which might (or not) work for you particular application. Those include random sampling starting point in a loop, using genetic algorithms, or - which is as far as I know most mathematically justified approach - using Bayesian optimization. In general the idea is to model your function in the same time when you try to minimize it, this way you can make informed guess where to start next time (which is level of abstraction higher than random guessing or using genetic algorithms/differential evolution). Thus, I would order these methods in following way
grid search / random sampling - uses no information from previous runs, thus - worst results
genetic approach, evolutionary, basin-hooping, annealing - use information from previous runs as a (x, f(x)) pairs, for limited period of time (generations) - thus average results
Bayesian optimization (and similar methods) - use information from all previous experiences through modeling of the underlying function and performing sampling selection based on expected improvement - best results (at the cost of most complex methods)

Parallel many dimensional optimization

I am building a script that generates input data [parameters] for another program to calculate. I would like to optimize the resulting data. Previously I have been using the numpy powell optimization. The psuedo code looks something like this.
def value(param):
run_program(param)
#Parse output
return value
scipy.optimize.fmin_powell(value,param)
This works great; however, it is incredibly slow as each iteration of the program can take days to run. What I would like to do is coarse grain parallelize this. So instead of running a single iteration at a time it would run (number of parameters)*2 at a time. For example:
Initial guess: param=[1,2,3,4,5]
#Modify guess by plus minus another matrix that is changeable at each iteration
jump=[1,1,1,1,1]
#Modify each variable plus/minus jump.
for num,a in enumerate(param):
new_param1=param[:]
new_param1[num]=new_param1[num]+jump[num]
run_program(new_param1)
new_param2=param[:]
new_param2[num]=new_param2[num]-jump[num]
run_program(new_param2)
#Wait until all programs are complete -> Parse Output
Output=[[value,param],...]
#Create new guess
#Repeat
Number of variable can range from 3-12 so something such as this could potentially speed up the code from taking a year down to a week. All variables are dependent on each other and I am only looking for local minima from the initial guess. I have started an implementation using hessian matrices; however, that is quite involved. Is there anything out there that either does this, is there a simpler way, or any suggestions to get started?
So the primary question is the following:
Is there an algorithm that takes a starting guess, generates multiple guesses, then uses those multiple guesses to create a new guess, and repeats until a threshold is found. Only analytic derivatives are available. What is a good way of going about this, is there something built already that does this, is there other options?
Thank you for your time.
As a small update I do have this working by calculating simple parabolas through the three points of each dimension and then using the minima as the next guess. This seems to work decently, but is not optimal. I am still looking for additional options.
Current best implementation is parallelizing the inner loop of powell's method.
Thank you everyone for your comments. Unfortunately it looks like there is simply not a concise answer to this particular problem. If I get around to implementing something that does this I will paste it here; however, as the project is not particularly important or the need of results pressing I will likely be content letting it take up a node for awhile.
I had the same problem while I was in the university, we had a fortran algorithm to calculate the efficiency of an engine based on a group of variables. At the time we use modeFRONTIER and if I recall correctly, none of the algorithms were able to generate multiple guesses.
The normal approach would be to have a DOE and there where some algorithms to generate the DOE to best fit your problem. After that we would run the single DOE entries parallely and an algorithm would "watch" the development of the optimizations showing the current best design.
Side note: If you don't have a cluster and needs more computing power HTCondor may help you.
Are derivatives of your goal function available? If yes, you can use gradient descent (old, slow but reliable) or conjugate gradient. If not, you can approximate the derivatives using finite differences and still use these methods. I think in general, if using finite difference approximations to the derivatives, you are much better off using conjugate gradients rather than Newton's method.
A more modern method is SPSA which is a stochastic method and doesn't require derivatives. SPSA requires much fewer evaluations of the goal function for the same rate of convergence than the finite difference approximation to conjugate gradients, for somewhat well-behaved problems.
There are two ways of estimating gradients, one easily parallelizable, one not:
around a single point, e.g. (f( x + h directioni ) - f(x)) / h;
this is easily parallelizable up to Ndim
"walking" gradient: walk from x0 in direction e0 to x1,
then from x1 in direction e1 to x2 ...;
this is sequential.
Minimizers that use gradients are highly developed, powerful, converge quadratically (on smooth enough functions).
The user-supplied gradient function
can of course be a parallel-gradient-estimator.
A few minimizers use "walking" gradients, among them Powell's method,
see Numerical Recipes p. 509.
So I'm confused: how do you parallelize its inner loop ?
I'd suggest scipy fmin_tnc
with a parallel-gradient-estimator, maybe using central, not one-sided, differences.
(Fwiw,
this
compares some of the scipy no-derivative optimizers on two 10-d functions; ymmv.)
I think what you want to do is use the threading capabilities built-in python.
Provided you your working function has more or less the same run-time whatever the params, it would be efficient.
Create 8 threads in a pool, run 8 instances of your function, get 8 result, run your optimisation algo to change the params with 8 results, repeat.... profit ?
If I haven't gotten wrong what you are asking, you are trying to minimize your function one parameter at the time.
you can obtain it by creating a set of function of a single argument, where for each function you freeze all the arguments except one.
Then you go on a loop optimizing each variable and updating the partial solution.
This method can speed up by a great deal function of many parameters where the energy landscape is not too complex (the dependency between the parameters is not too strong).
given a function
energy(*args) -> value
you create the guess and the function:
guess = [1,1,1,1]
funcs = [ lambda x,i=i: energy( guess[:i]+[x]+guess[i+1:] ) for i in range(len(guess)) ]
than you put them in a while cycle for the optimization
while convergence_condition:
for func in funcs:
optimize fot func
update the guess
check for convergence
This is a very simple yet effective method of simplify your minimization task. I can't really recall how this method is called, but A close look to the wikipedia entry on minimization should do the trick.
You could do parallel at two parts: 1) parallel the calculation of single iteration or 2) parallel start N initial guessing.
On 2) you need a job controller to control the N initial guess discovery threads.
Please add an extra output on your program: "lower bound" that indicates the output values of current input parameter's decents wont lower than this lower bound.
The initial N guessing thread can compete with each other; if any one thread's lower bound is higher than existing thread's current value, then this thread can be dropped by your job controller.
Parallelizing local optimizers is intrinsically limited: they start from a single initial point and try to work downhill, so later points depend on the values of previous evaluations. Nevertheless there are some avenues where a modest amount of parallelization can be added.
As another answer points out, if you need to evaluate your derivative using a finite-difference method, preferably with an adaptive step size, this may require many function evaluations, but the derivative with respect to each variable may be independent; you could maybe get a speedup by a factor of twice the number of dimensions of your problem. If you've got more processors than you know what to do with, you can use higher-order-accurate gradient formulae that require more (parallel) evaluations.
Some algorithms, at certain stages, use finite differences to estimate the Hessian matrix; this requires about half the square of the number of dimensions of your matrix, and all can be done in parallel.
Some algorithms may also be able to use more parallelism at a modest algorithmic cost. For example, quasi-Newton methods try to build an approximation of the Hessian matrix, often updating this by evaluating a gradient. They then take a step towards the minimum and evaluate a new gradient to update the Hessian. If you've got enough processors so that evaluating a Hessian is as fast as evaluating the function once, you could probably improve these by evaluating the Hessian at every step.
As far as implementations go, I'm afraid you're somewhat out of luck. There are a number of clever and/or well-tested implementations out there, but they're all, as far as I know, single-threaded. Your best bet is to use an algorithm that requires a gradient and compute your own in parallel. It's not that hard to write an adaptive one that runs in parallel and chooses sensible step sizes for its numerical derivatives.

Categories

Resources