`the shortest path that connects several points` NP-complete algorithms - python

I am reading `Grokking algorithms" and understand Dijkstran and greedy algorithms,
but, when the author compare them with NP-complete problem
But it’s hard to tell if a problem you’re working on is NP-complete. Usually there’s a very small difference between a problem that’s easy to solve and an NP-complete problem. For example, in the previous chapters, I talked a lot about shortest paths. You know how to calculate the shortest way to get from point A to point B.
But if you want to find the shortest path that connects several points, that’s the traveling-salesperson problem, which is NP-complete. The short answer: there’s no easy way to tell if the problem you’re working on is NP-complete. Here are some giveaways:
The sentence:
But if you want to find the shortest path that connects several points,
What are "the several points"?
I cannot figure out any difference with a basic Dijkstan's algorithms problem.

He means a path through a subset of all the nodes of a graph, I think. (Think the worst case of 'several points')
Note, that for any fixed number of points, say k = 3 or k = 3000 on a graph of n nodes, the problem would be of the same complexity as for two points. While some people may think that several is never greater than seven, or may be seven dozens or seven billion, it is neither a matter of fact nor an exact science.
Less likely he meant the usual formulation of the Traveling salesman problem (all the nodes/points on a connected graph), though a possibility. NP complete any way.

Related

Design an algorithm to find the average distance between two path

I have a database of time-stamped points which represent a path being drawn by a user in a 2-D plane. I also have a list of points which represent the goal path. These are not timestamped. I want to find how accurate the users' drawn paths are as compared to the goal path. The parameter to define accuracy is not clear and something I'm trying to decide. I don't really care about the temporal aspect of the user drawn path. I only want to compare the two paths.
I'm doing this to do the analysis for an experiment done by a behavioral lab. This is my current algorithm.
Find the total distance of the user drawn path by adding the straight line difference between all points.
At every 1% of the total distance of both the user path and the goal path find the straight line distance between the two paths.
Average the 100 points together to get the total average distance between the two paths.
Increase the sampling frequency if I want to have a more accurate number
I'm only looking for algorithmic help since implementing this would be quite trivial. My issue is that I'm not sure whether I'm missing something here, and nor sure of the correctness of this algorithm and wanted to run it by some experienced programmers.
I'm not a programmer by trade but this data analysis is essential for the paper the lab is working on. I'm not sure if I need to be familiar with some higher level Math which makes this trivial.
I'm completely language agnostics and would appreciate any pointers to any existing algorithms or novel solutions which solve this problem.

How can I use the Traveling Salesperson Problem (TSP) with a List of Haversine Distances?

I have a list of haversine distances between customers and their respective salespersons and I would like to apply the TSP algorithm to optimize each salesperson's distance traveled in a given day. What would be the best approach to solving this problem in R or Python?
Note: I do not need to visualize this through any map, I just need to shortest distance traveled between each customer starting and ending with the salesperson location.
Well, there are a lot of strategies you can use to solve this problem, such as (1) approximation algorithms, (2) exact approaches, and, of course, (3) heuristic/metaheuristic approaches. Note that, the optimal solution of a given instance is guaranteed to be achieved only using exact approaches.
Regarding each strategy, bellow follows some links that might help you:
Approximation algorithms: There is a famous approximation algorithm for the metric TSP, i.e. when the graph is metric, called Christofides algorithm. But for a general graph, there is no approximation algorithm, unless P = NP (you can check the proof of this theorem here, at section 2);
Exact approaches: For the TSP, this strategy usually divides itself into two categories (but not limited): dynamic programming and integer programming;
Heuristic/Metaheuristic approaches: I will not enter in details about it since there are a big number of heuristics and metaheuristics available for the TSP (you can check it by yourself here).
As you can see, my answer is very open, since your question is very open. So, if you want to get a more precise answer, you need to specify exactly what you need/want.

Routing problems with a large amount of points and one constraint

I am currently tackling a routing problem where I have to create daily schedule for workers to repair some installations. There 200,000 installations and a worker can only work 8 hours per fay. The goal is to make optimal routes on a daily basis; therefore optimizing the distance between the different points he has to visit on a daily basis but there is also a constraint on the priority of each installation. Indeed each installation has a priority between 0 and 1 and higher priority points should be given higher weights.
I am just looking for some suggestions as I have tried implementing some solutions (https://developers.google.com/optimization/routing/tsp) but due to the many points I have, this results in too long computation time.
Thank you.
Best regards,
Charles
As you know, there is no perfect answer for your issue, but maybe I can guide your research :
Alpha-Beta pruning : I've been using it to reduce the amount of possibilities for an AI playing Hex game.
A* pathfinding : I've been using it to simulate a futuristic hyperloop-like capsule-based network, as a complement of Dijkstra algorithm.
You can customize both algorithm according to your needs.
Hoping to be useful !
Due to large scale of the described problem it is nearly impossible to achieve the optimal solution for each case. You could try something based on mixed integer programming, especially in TSP or vehicle routing problem but I assume that it won't work in your case.
What you should try, at least in my opinion, are heuristic approaches for solving TSP/VRP: tabu search, simulated annealing, hill climbing. Given enough time and a proper set of constraints one of these methods would produce "good enough" solutions, which are much better than a random guessing. Take a look at something like Google OR-Tools
That's a massive sized problem. You will need to cluster it into smaller subproblems before tackling it. We've applied sophisticated fuzzy clustering techniques to experimentally solve a 20,000 location problem. For 200,000 you'll probably need to aggregate by geographic regions (e.g. postcode / zipcode) though before you could attempt to run some kind of clustering to split it up. Alternatively you may just want to try a hard split based on geography first of all.

Efficient scheduling of university courses

I'm currently working on a website that will allow students from my university to automatically generate valid schedules based on the courses they'd like to take.
Before working on the site itself, I decided to tackle the issue of how to schedule the courses efficiently.
A few clarifications:
Each course at our university (and I assume at every other
university) comprises of one or more sections. So, for instance,
Calculus I currently has 4 sections available. This means that, depending on the amount of sections, and whether or not the course has a lab, this drastically affects the scheduling process.
Courses at our university are represented using a combination of subject abbreviation and course code. In the case of Calculus I: MATH 1110.
The CRN is a code unique to a section.
The university I study at is not mixed, meaning males and females study in (almost) separate campuses. What I mean by almost is that the campus is divided into two.
The datetimes and timeranges dicts are meant to decreases calls to datetime.datetime.strptime(), which was a real bottleneck.
My first attempt consisted of the algorithm looping continuously until 30 schedules were found. Schedules were created by randomly choosing a section from one of the inputted courses, and then trying to place sections from the remaining courses to try to construct a valid schedule. If not all of the courses fit into the schedule i.e. there were conflicts, the schedule was scrapped and the loop continued.
Clearly, the above solution is flawed. The algorithm took too long to run, and relied too much on randomness.
The second algorithm does the exact opposite of the old one. First, it generates a collection of all possible schedule combinations using itertools.product(). It then iterates through the schedules, crossing off any that are invalid. To ensure assorted sections, the schedule combinations are shuffled (random.shuffle()) before being validated. Again, there is a bit of randomness involved.
After a bit of optimization, I was able to get the scheduler to run in under 1 second for an average schedule consisting of 5 courses. That's great, but the problem begins once you start adding more courses.
To give you an idea, when I provide a certain set of inputs, the amount of combinations possible is so large that itertools.product() does not terminate in a reasonable amount of time, and eats up 1GB of RAM in the process.
Obviously, if I'm going to make this a service, I'm going to need a faster and more efficient algorithm. Two that have popped up online and in IRC: dynamic programming and genetic algorithms.
Dynamic programming cannot be applied to this problem because, if I understand the concept correctly, it involves breaking up the problem into smaller pieces, solving these pieces individually, and then bringing the solutions of these pieces together to form a complete solution. As far as I can see, this does not apply here.
As for genetic algorithms, I do not understand them much, and cannot even begin to fathom how to apply one in such a situation. I also understand that a GA would be more efficient for an extremely large problem space, and this is not that large.
What alternatives do I have? Is there a relatively understandable approach I can take to solve this problem? Or should I just stick to what I have and hope that not many people decide to take 8 courses next semester?
I'm not a great writer, so I'm sorry for any ambiguities in the question. Please feel free to ask for clarification and I'll try my best to help.
Here is the code in its entirety.
http://bpaste.net/show/ZY36uvAgcb1ujjUGKA1d/
Note: Sorry for using a misleading tag (scheduling).
Scheduling is a very famous constraint satisfaction problem that is generally NP-Complete. A lot of work has been done on the subject, even in the same context as you: Solving the University Class Scheduling Problem Using Advanced ILP Techniques. There are even textbooks on the subject.
People have taken many approaches, including:
Dynamic programming
Genetic algorithms
Neural networks
You need to reduce your problem-space and complexity. Make as many assumptions as possible (max amount of classes, block based timing, ect). There is no silver bullet for this problem but it should be possible to find a near-optimal solution.
Some semi-recent publications:
QUICK scheduler a time-saving tool for scheduling class sections
Scheduling classes on a College Campus
Did you ever read anything about genetic programming? The idea behind it is that you let the 'thing' you want solved evolve, just by itsself, until it has grown to the best solution(s) possible.
You generate a thousand schedules, of which usually zero are anywhere in the right direction of being valid. Next, you change 'some' courses, randomly. From these new schedules you select some of the best, based on ratings you give according to the 'goodness' of the schedule. Next, you let them reproduce, by combining some of the courses on both schedules. You end up with a thousand new schedules, but all of them a tiny fraction better than the ones you had. Let it repeat until you are satisfied, and select the schedule with the highest rating from the last thousand you generated.
There is randomness involved, I admit, but the schedules keep getting better, no matter how long you let the algorithm run. Just like real life and organisms there is survival of the fittest, and it is possible to view the different general 'threads' of the same kind of schedule, that is about as good as another one generated. Two very different schedules can finally 'battle' it out by cross breeding.
A project involving school schedules and genetic programming:
http://www.codeproject.com/Articles/23111/Making-a-Class-Schedule-Using-a-Genetic-Algorithm
I think they explain pretty well what you need.
My final note: I think this is a very interesting project. It is quite difficult to make, but once done it is just great to see your solution evolve, just like real life. Good luck!
The way you're currently generating combinations of sections is probably throwing up huge numbers of combinations that are excluded by conflicts between more than one course. I think you could reduce the number of combinations that you need to deal with by generating the product of the sections for only two courses first. Eliminate the conflicts from that set, then introduce the sections for a third course. Eliminate again, then introduce a fourth, and so on. This should see a more linear growth in the processing time required as the number of courses selected increases.
This is a hard problem. It you google something like 'course scheduling problem paper' you will find a lot of references. Genetic algorithm - no, dynamic programming - yes. GAs are much harder to understand and implement than standard DP algos. Usually people who use GAs out of the box, don't understand standard techniques. Do some research and you will find different algorithms. You might be able to find some implementations. Coming up with your own algorithm is way, way harder than putting some effort into understanding DP.
The problem you're describing is a Constraint Satisfaction Problem. My approach would be the following:
Check if there's any uncompatibilities between courses, if yes, record them as constraints or arcs
While not solution is found:
Select the course with less constrains (that is, has less uncompatibilities with other courses)
Run the AC-3 algorithm to reduce search space
I've tried this approach with sudoku solving and it worked (solved the hardest sudoku in the world in less than 10 seconds)

Shortest total path among set of Latitude/Longitudes

I have a set of 52 or so latitude/longitude pairs. I simply need to find the shortest path through all of them; it doesn't matter where staring point or ending point is.
I've implemented Dijkstra's algorithm by hand multiple times before and don't really have the time to do it again. I've found a couple things that come close, but most require raw graphs with pre-computed weights for each edge.
Do you know of any libraries or existing scripts/applications which will compute the shortest path in this manner? The code/libraries would preferably use Python or Clojure but it really doesn't matter.
Thanks
If this is a closed path, it is the Traveling Salesman Problem, and a sub-optimal but quite effective way to resolve it is to use Simulated Annealing
In python, the best graph handling library I was able to put my hands on is networkx. It supports a broad range of different algos for short path search.
Go for it. It's really complete and well designed.
Isn't this the Traveling Salesman Problem, and therefore there is no efficient way to solve it?

Categories

Resources