I have a list of haversine distances between customers and their respective salespersons and I would like to apply the TSP algorithm to optimize each salesperson's distance traveled in a given day. What would be the best approach to solving this problem in R or Python?
Note: I do not need to visualize this through any map, I just need to shortest distance traveled between each customer starting and ending with the salesperson location.
Well, there are a lot of strategies you can use to solve this problem, such as (1) approximation algorithms, (2) exact approaches, and, of course, (3) heuristic/metaheuristic approaches. Note that, the optimal solution of a given instance is guaranteed to be achieved only using exact approaches.
Regarding each strategy, bellow follows some links that might help you:
Approximation algorithms: There is a famous approximation algorithm for the metric TSP, i.e. when the graph is metric, called Christofides algorithm. But for a general graph, there is no approximation algorithm, unless P = NP (you can check the proof of this theorem here, at section 2);
Exact approaches: For the TSP, this strategy usually divides itself into two categories (but not limited): dynamic programming and integer programming;
Heuristic/Metaheuristic approaches: I will not enter in details about it since there are a big number of heuristics and metaheuristics available for the TSP (you can check it by yourself here).
As you can see, my answer is very open, since your question is very open. So, if you want to get a more precise answer, you need to specify exactly what you need/want.
I have a simple solver which I am using to solve a knapsack-like problem. I am looking to maximize a value while keeping constraints in mind
self.solver = pywraplp.Solver(
'FD',
pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING
)
self.objective = self.solver.Objective()
self.objective.SetMaximization()
self.solver.solve()
I left out the code to define variables, but my question is: Running this code will get me the optimal lineup. Is there a way to find the 2nd, 3rd, etc. best solution?
With CBC no.
CPLEX, Gurobi do support keeping more solutions, but this is available in OR-Tools only for Gurobi through the NextSolution() method.
If your model is purely integral, you can have a look at the CP-SAT solver.
The trick is the unless you explore all solutions, the second best solution is heuristic at best.
In a knapsack like problem it is straight forward to obtain the next best solution in an iterative procedure.
After the problem was solved for the first time, you can add a constraint where the left hand side sums over all items included in the optimal solution and the right hand side limits this sum to one less than the number of items included in the optimal solution.
This is essentially a cut which excludes the first optimal solution from the solution space. Thus, a different solution will be obtained by solving the problem after adding the additional constraint.
Say I have to solve for a large system of equations where
A_i = f(B_i)
B_i = g(A_i)
for many different i. Now, this is a system of equations which are only pair-wise dependent. The lm algorythm has proven most stable to solve this.
Now, I could solve these either independently (i.e. loop over i many scipy.optimize.root, or stack them all together and solve at the same time). I'm unsure which will be the fastest, and it's difficult to know generally. I'm having the following arguments for and against:
The algorythm initially numerically approximates the Jacobian at the provided guess, increasing dimensionality exponentially increases the time it takes to find the Jacobian (speaks against stacking)
Once the Jacobian is found, most of the updating is linear matrix algebra, and therefore should be faster if stacked.
Does that make sense? My conclusion would in that case be "if solving it takes a long time (bad guess or irregular function), stack them, if it's quick, do not stack".
I am not sure I understand correctly; when you say that they are pairwise dependent do you mean that the full system can be decomposed in a collection of small 2x2 systems? If so, you should definitely opt for solving the smaller systems. If not, can you provide some equations?
I am trying to create an algorithm using the divide-and-conquer approach but using an iterative algorithm (that is, no recursion).
I am confused as to how to approach the loops.
I need to break up my problems into smaller sub problems, until I hit a base case. I assume this is still true, but then I am not sure how I can (without recursion) use the smaller subproblems to solve the much bigger problem.
For example, I am trying to come up with an algorithm that will find the closest pair of points (in one-dimensional space - though I intend to generalize this on my own to higher dimensions). If I had a function closest_pair(L) where L is a list of integer co-ordinates in ℝ, how could I come up with a divide and conquer ITERATIVE algorithm that can solve this problem?
(Without loss of generality I am using Python)
The cheap way to turn any recursive algorithm into an iterative algorithm is to take the recursive function, put it in a loop, and use your own stack. This eliminates the function call overhead and from saving any unneeded data on the stack. However, this is not usually the "best" approach ("best" depends on the problem and context.)
They way you've worded your problem, it sounds like the idea is to break the list into sublists, find the closest pair in each, and then take the closest pair out of those two results. To do this iteratively, I think a better way to approach this than the generic way mentioned above is to start the other way around: look at lists of size 3 (there are three pairs to look at) and work your way up from there. Note that lists of size 2 are trivial.
Lastly, if your coordinates are integers, they are in Z (a much smaller subset of R).
I am building a script that generates input data [parameters] for another program to calculate. I would like to optimize the resulting data. Previously I have been using the numpy powell optimization. The psuedo code looks something like this.
def value(param):
run_program(param)
#Parse output
return value
scipy.optimize.fmin_powell(value,param)
This works great; however, it is incredibly slow as each iteration of the program can take days to run. What I would like to do is coarse grain parallelize this. So instead of running a single iteration at a time it would run (number of parameters)*2 at a time. For example:
Initial guess: param=[1,2,3,4,5]
#Modify guess by plus minus another matrix that is changeable at each iteration
jump=[1,1,1,1,1]
#Modify each variable plus/minus jump.
for num,a in enumerate(param):
new_param1=param[:]
new_param1[num]=new_param1[num]+jump[num]
run_program(new_param1)
new_param2=param[:]
new_param2[num]=new_param2[num]-jump[num]
run_program(new_param2)
#Wait until all programs are complete -> Parse Output
Output=[[value,param],...]
#Create new guess
#Repeat
Number of variable can range from 3-12 so something such as this could potentially speed up the code from taking a year down to a week. All variables are dependent on each other and I am only looking for local minima from the initial guess. I have started an implementation using hessian matrices; however, that is quite involved. Is there anything out there that either does this, is there a simpler way, or any suggestions to get started?
So the primary question is the following:
Is there an algorithm that takes a starting guess, generates multiple guesses, then uses those multiple guesses to create a new guess, and repeats until a threshold is found. Only analytic derivatives are available. What is a good way of going about this, is there something built already that does this, is there other options?
Thank you for your time.
As a small update I do have this working by calculating simple parabolas through the three points of each dimension and then using the minima as the next guess. This seems to work decently, but is not optimal. I am still looking for additional options.
Current best implementation is parallelizing the inner loop of powell's method.
Thank you everyone for your comments. Unfortunately it looks like there is simply not a concise answer to this particular problem. If I get around to implementing something that does this I will paste it here; however, as the project is not particularly important or the need of results pressing I will likely be content letting it take up a node for awhile.
I had the same problem while I was in the university, we had a fortran algorithm to calculate the efficiency of an engine based on a group of variables. At the time we use modeFRONTIER and if I recall correctly, none of the algorithms were able to generate multiple guesses.
The normal approach would be to have a DOE and there where some algorithms to generate the DOE to best fit your problem. After that we would run the single DOE entries parallely and an algorithm would "watch" the development of the optimizations showing the current best design.
Side note: If you don't have a cluster and needs more computing power HTCondor may help you.
Are derivatives of your goal function available? If yes, you can use gradient descent (old, slow but reliable) or conjugate gradient. If not, you can approximate the derivatives using finite differences and still use these methods. I think in general, if using finite difference approximations to the derivatives, you are much better off using conjugate gradients rather than Newton's method.
A more modern method is SPSA which is a stochastic method and doesn't require derivatives. SPSA requires much fewer evaluations of the goal function for the same rate of convergence than the finite difference approximation to conjugate gradients, for somewhat well-behaved problems.
There are two ways of estimating gradients, one easily parallelizable, one not:
around a single point, e.g. (f( x + h directioni ) - f(x)) / h;
this is easily parallelizable up to Ndim
"walking" gradient: walk from x0 in direction e0 to x1,
then from x1 in direction e1 to x2 ...;
this is sequential.
Minimizers that use gradients are highly developed, powerful, converge quadratically (on smooth enough functions).
The user-supplied gradient function
can of course be a parallel-gradient-estimator.
A few minimizers use "walking" gradients, among them Powell's method,
see Numerical Recipes p. 509.
So I'm confused: how do you parallelize its inner loop ?
I'd suggest scipy fmin_tnc
with a parallel-gradient-estimator, maybe using central, not one-sided, differences.
(Fwiw,
this
compares some of the scipy no-derivative optimizers on two 10-d functions; ymmv.)
I think what you want to do is use the threading capabilities built-in python.
Provided you your working function has more or less the same run-time whatever the params, it would be efficient.
Create 8 threads in a pool, run 8 instances of your function, get 8 result, run your optimisation algo to change the params with 8 results, repeat.... profit ?
If I haven't gotten wrong what you are asking, you are trying to minimize your function one parameter at the time.
you can obtain it by creating a set of function of a single argument, where for each function you freeze all the arguments except one.
Then you go on a loop optimizing each variable and updating the partial solution.
This method can speed up by a great deal function of many parameters where the energy landscape is not too complex (the dependency between the parameters is not too strong).
given a function
energy(*args) -> value
you create the guess and the function:
guess = [1,1,1,1]
funcs = [ lambda x,i=i: energy( guess[:i]+[x]+guess[i+1:] ) for i in range(len(guess)) ]
than you put them in a while cycle for the optimization
while convergence_condition:
for func in funcs:
optimize fot func
update the guess
check for convergence
This is a very simple yet effective method of simplify your minimization task. I can't really recall how this method is called, but A close look to the wikipedia entry on minimization should do the trick.
You could do parallel at two parts: 1) parallel the calculation of single iteration or 2) parallel start N initial guessing.
On 2) you need a job controller to control the N initial guess discovery threads.
Please add an extra output on your program: "lower bound" that indicates the output values of current input parameter's decents wont lower than this lower bound.
The initial N guessing thread can compete with each other; if any one thread's lower bound is higher than existing thread's current value, then this thread can be dropped by your job controller.
Parallelizing local optimizers is intrinsically limited: they start from a single initial point and try to work downhill, so later points depend on the values of previous evaluations. Nevertheless there are some avenues where a modest amount of parallelization can be added.
As another answer points out, if you need to evaluate your derivative using a finite-difference method, preferably with an adaptive step size, this may require many function evaluations, but the derivative with respect to each variable may be independent; you could maybe get a speedup by a factor of twice the number of dimensions of your problem. If you've got more processors than you know what to do with, you can use higher-order-accurate gradient formulae that require more (parallel) evaluations.
Some algorithms, at certain stages, use finite differences to estimate the Hessian matrix; this requires about half the square of the number of dimensions of your matrix, and all can be done in parallel.
Some algorithms may also be able to use more parallelism at a modest algorithmic cost. For example, quasi-Newton methods try to build an approximation of the Hessian matrix, often updating this by evaluating a gradient. They then take a step towards the minimum and evaluate a new gradient to update the Hessian. If you've got enough processors so that evaluating a Hessian is as fast as evaluating the function once, you could probably improve these by evaluating the Hessian at every step.
As far as implementations go, I'm afraid you're somewhat out of luck. There are a number of clever and/or well-tested implementations out there, but they're all, as far as I know, single-threaded. Your best bet is to use an algorithm that requires a gradient and compute your own in parallel. It's not that hard to write an adaptive one that runs in parallel and chooses sensible step sizes for its numerical derivatives.